Tuesday, December 18, 2007

Mod-direction

If levels of indirection in our lives were as prevalent as those found in information technology, life would be pretty tedious and it would be an awful lot of work to get even small things done. Yet levels of indirection are par for the course in technology. The abstractions they engender form a patchwork that give us the applications we are accustomed to using. Everything from desktop applications to the World Wide Web.

For example we speak of 32 bit processors and increasingly 64 bit processors and it is mostly understood by those in technology that this delimits addressable memory. Which means that in the case of your 32 bit system (probably; at the time this was written) it can address 2^32 bytes of memory or more specifically 4 gigabytes in the context of a single application. In the case of a 64 processor this rises to 1.84467441 × 10^19 bytes. That's:

18,446,744,100,000,000,000

So a 4.6 billion times increase over what a 32 bit processor can address.

But it turns out that other "bits" dictate the bytes that make up the data and code associated with our applications. The Intel x86 architecture for example has registers called selectors which comprise 16 bits. The first 13 bits are actually an index into a table the processor keeps, the next bit dictates which table to do a lookup against. A single bit has two states, 0 or 1, which means that there's two tables. These happen to be called Global Descriptor Table and Local Descriptor Table. 2^13 means 8192 possibilities which means there's at least 16,384 initial paths before the 32 or 64 bits that people typically speak of start to take context - in the form of either code to be executed or data to manipulate. To make things very lucid for a 64 bit processor this means:

18,446,744,100,000,000,000 x 16,384 logical paths to code and data

To quote Sir Isaac Newton, "If I have been able to see further it is because I have stood on the shoulders of giants."

Much of we take for granted in our every day lives is attributable to "behind the scenes" machinations seemingly akin to caricatures of of mouse trap contraptions that have played themselves out in pop culture from cartoons to board games. From the power grid system to water flowing through our lavatories to the Internet there is much ado with the conveniences we are accustomed to with the flick of a finger.

A router in a home masks the fact that a user has several machines behind a single TCP/IP address. This many to one relationship is a level of indirection. This indirection, redirection or in the case of malice, misdirection (think botnets), happens at a level that both endpoints are oblivious to, e.g., a web browser and a web server.

Things get interesting when decision making (indirection) happens at the the initial stage of browsing a web site, which is mutually exclusive of the indirection happening at your home router. It may turn out that the initial chatter your web browser conducts when going against your favorite web site is really against a load balancing device that makes decisions based on the URLs you are requesting. So when you hit www.my-favorite-site.com/search/query.jsp, the device keys off of search in the URL and always sends these page requests to one or more web servers dedicated to searching the knowledge space of that site. To this end, products from f5 are well known in the industry. This is often known as layer 7 load balancing. Layer 7 is the top most layer of the OSI model of networking and is called the application layer since that layer is closest to the abstractions that make up an application, in the example given, the request to a search page for information.

A home router works at layer 3. All these layers are levels of indirection that build on top of each other to facilitate the applications that are familiar to any contemporary Net user, from IM to web browsers to Bittorrent clients.

Another potential decision point for web servers is the kind of HTTP method a client is making. The HTTP protocol allows a client, such as a web browser to make different types of requests to a web server. An overwhelming majority of requests made against sites are usually GET requests by web browsers and it means what it insinuates get me a page. The second most common HTTP method is the POST method. This method is usually used with forms, so anytime you've entered your name, address and credit card and hit the Buy button, odds are overwhelming that an HTTP POST method was at work.

When the HTTP protocol was designed in 1989 the world was a simpler place, the issues of phishing, email spam and other rogue forms of misdirection were yet to surface. In 1999 with revision 1.1 of the HTTP protocol the HTTP TRACE method was added. Having done software development in the past, the motivation is clear, it was designed as a debugging tool. The method simply has the web server echo a request that was sent to it. Sounds benign enough except that if your web browser has cookies associated with the path sent to the HTTP TRACE method, the web server will readily echo the cookie values. Again this sounds benign enough except in the contemporary world of rogue JavaScript (embedded in a malicious HTML email that happens to be spam), the HTTP TRACE method in conjunction with the XMLHttpRequest object can be used to read cookie values and then send them off to someone who just might then go into your bank account and relieve you of your finances. This forms the basis of a Cross Site Scripting attack.

The XMLHttpRequest object in JavaScript forms the basis of Web 2.0 applications, i.e., applications that are very dynamic in nature and feel like a local desktop application. You could turn off JavaScript but you will find out soon enough that it is a lost cause. Most sites rely on JavaScript to provide the experience that end users are accustomed to.

One problem in the management of IT infrastructure and the applications that live on that infrastructure is time. In time, every computer system that is deployed will degrade and become a legacy system.

Any computer system has two factors working against it.

The hardware technologies employed will become antiquated and may not be up to the ask with the growth of an organization. If the system is up to the task other factors over time readily contribute such as replacement parts or in some cases, the vendor of the original hardware ceasing support or going out of business. Some systems are labeled legacy systems but very much still form the crux of what is happening in the here and now. The Internal Revenue Service grapples with the problem of having a well entrenched legacy system.

With the software residing on a system, a wide range of factors contribute it moving to legacy status. Everything from employee attrition (waning knowledge base) to business requirements not being met with the software initially deployed to finding people with skill sets that can make changes to the system.

The factors and their weight will vary widely but just as time weighs down on our knees, time will weigh down any computer system from both the hardware side and the software side making continued use a questionable business proposition. Either in terms of the opportunity cost of lost business or the prohibitive cost of the status quo. In the case of the IRS, the government being what it is, it has resources (your tax dollar) that the private sector could not muster, a.k.a. bankruptcy.

This past week I ran into one of these legacy systems. An Internet facing application server employing JBoss. JBoss is intended to run Java code that makes up web applications. RedHat acquired JBoss in April of 2006 for $350 million. RedHat is well ensconced in the IT industry and JBoss is still well supported so what made this system a legacy system? The version of JBoss running on the server in question was dated to March of 2003 and staff that put it into use where no longer with the organization. More to the point there was little documentation. This is classic case of attrition making a system difficult to support and why managers should be big fans of knowledge transparency in the forms of wikis and allocating time up front to document systems. My role in this case was strictly that of system administrator of the underlying LINUX OS.

This Internet facing application was flagged by auditors for a Cross Site Scripting vulnerability on account of the web server servicing the HTTP TRACE method. I was tasked with seeing if the system could be reconfigured to turn off servicing HTTP TRACE requests.

As I performed a discovery process I learned that the first tier that responds to HTTP requests is a complementary piece of open source software to JBoss called Jetty, an open source web server written entirely in Java. After I ferreted out where Jetty's configuration file lived, through online investigations I discovered that the version of Jetty included with the version of JBoss found on the production system did not have the facility of turning off the HTTP TRACE method. In fact, the version of Jetty employed was one minor revision number off from having the ability to turn off HTTP TRACE.

A quick initial assessment of the situation purely as a function of the application served up as well as the application server itself (JBoss/Jetty) seems to leave only difficult choices. If the application server software is updated, it could very well break the underlying application. As alluded to earlier, part of the problem with legacy systems is knowledge transfer, or lack thereof. Outside of a cursory inspection of this Internet facing web application such as logging on to check base functionality, little was documented, such as a regression test plan or even the tools to carry out such a regression. Perhaps an upgrade of the underlying application server would go well initially (software starts) but application functionality will break (what end user's perceive). Rewriting the application is not realistic either. The application's entry point was flagged during an audit so more than likely remediation is expected sooner rather than later. In addition, asking managers to expend resources to rewrite applications impromptu is not likely to garner support (think budgets), especially if the scope of the application is large. Nor is it likely to thrill tech people who may be on time lines with other projects (see point 8).

Indeed this seems like a very thorny problem with choices that entail lots of risk. Unless that is, you happen to be familiar with the various levels of indirection and their layers and how they play among themselves but even more importantly the tools to manipulate them.

The Apache web server is an amazing piece of software. Despite years of Microsoft giving Internet Information Server (IIS) away for free and Steve Ballmer's hot air of calling one operating system that popularly runs Apache a cancer, i.e. LINUX, it still remains the most popular web server in use on the Internet today. Rather than take my work for it (or not), visit Netcraft which offers a tool that allows you to see what web server your favorite web site is using. The link provided shows the web server platforms behind sites that people were curious about the most - observe that Microsoft's platform is barely on the radar (12/2007).

This is no coincidence. One of Larry Wall's mantras for his popular PERL programming language when he designed it was make the easy things easy while making the difficult things possible. It is a philosophy that shows up consistently with the proponents of open source technologies. It is one thing to be the end user of a site, say Amazon, that may be a complex heterogenous mix of computing platforms, ignorance as they say is bliss, but it is quite another if you must administer the platforms that comprise such a site. The more degrees of freedom that exist to make the system pliable, the greater the ability to adapt to situations as they arise, planned or unplanned... such as when an auditor is complaining.

What makes Apache so powerful specifically are all the modules of extensibility that come with the system 'out of the box'. They are often called mods for short. Being part of the open source community the Apache web server has engendered an active developer community that affords administrators great flexibility with configurations but also in the manipulation of HTTP traffic.

Manipulation of HTTP traffic is one powerful use case for Apache. In this capacity, an Apache web server instance never actually hosts any web pages but is used as a traffic cop to redirect HTTP traffic based on rule sets giving the illusion to the end user of a cohesive web site but in actuality the site may be an amalgam of different web servers.

The Apache modules mod_proxy and mod_rewrite provide these facilities and extensive use cases and the minutiae of the semantics of the rule sets are beyond this write up. Suffice to say they can be used to solve the thorny problem of turning off the HTTP TRACE method.

The solution is simply to have the original web server that fronts the offending application to listen on a non-standard TCP port such as 8080, which is not visible to the outside world on account of a firewall, have Apache listen on the well known port (80) then with a mod_proxy/mod_rewrite rule set direct traffic based on the HTTP method. If a request in the form of HTTP TRACE method comes in, simply deny it.

This is easily done through:

RewriteEngine on
RewriteCond %{REQUEST_METHOD} ^TRACE
RewriteRule .* - [F]

RewriteRule ^/(.*) http://localhost:8080/$1 [R,P]

The RewriteCond statement looks for the TRACE method and the RewriteRule after it causes Apache to return a 403 Forbidden page. The second RewriteRule is a catch all that simply has Apache delegate all other HTTP traffic to the server listening on port 8080 on the same system Apache resides, which in this case is the offending JBoss/Jetty server that will gladly service HTTP TRACE requests. However Apache configured as such will filter these TRACE requests so all that the auditor sees is a result that nullifies the previous observation that this particular Internet facing web application services the HTTP TRACE method. And thus with a deploy of the Apache web server and the addition of four statements in its configuration file, no blown budgets, no interrupting developers.

In the interest of a balanced viewpoint it turns out that if you have a contemporary application load balancing device such as those from f5 the filtering of the HTTP TRACE method can happen there. But it turns out in this case the application was not fronted by such a device. If such a device does exist, the politics at play within an organization may make it more work to involve other teams to make network changes rather than to reconfigure software on a single server that ultimately receives the offending requests.

A better decision point however is the number of points of potential change. It is more cost effective to manage access control from a central location such as an f5 device than visiting multitudes of boxes to make configuration changes if the Internet application is backed by an entire web server farm. Since that was not the case here, an Apache solution was employed.

To quote Arthur C. Clarke, "Any sufficiently advanced technology is indistinguishable from magic." If these tiers of indirection are not on your radar, yep, magic.

No comments: