Linux Today: Linux News On Internet Time.

Apache Today: Writing Input Filters for Apache 2.0

Nov 22, 2000, 14:14 (0 Talkback[s])
(Other stories by Ryan Bloom)

"Input filtering and outputing filtering are basically the same thing, with some very minor differeces. Both input and output filtering rely on buckets and bucket brigades to pass data from one filter to the next. Both have filters that are associated with the connection and filters that are associated with the request."

"Output filters are relatively straight-forward, the filter gets handed data which it either adds to or modifies, and that data gets passed to the next filter. Input filtering can not work this way because Apache isn't generating the data, it has to rely on getting the data from the network. Because of this difference, input filters get called with an emtpy brigade and they pass this brigade to the next filter. The lowest filter in the chain inserts data into the brigade and returns to the previous filter. That filter can then modify the data and send the brigade to the previous filter, and so on until the brigade is returned to the Apache core."

"Input filters differ from output filters in one other significant manor. Most output filters only deal with actual data, headers are stored in a table in the request_rec, and there is a core filter that converts that table to a stream of data that is sent to the client. The output headers filter sits is low enough in the filter stack that only filters that are dealing with formatting the data for transmission to the client (e.g. chunking) are after it. Input filtering and headers have a very different relationship. All data coming from a client must pass through the input filters to get to the Apache core. This means that input filters have an opportunity to change the headers of a request before the core ever sees it."

"The module that I am presenting this month will modify the headers for a request while Apache reads it. This module came about at ApacheCon Europe 2000 because of the CD that was distributed with the conference proceedings. This CD was created on a Windows machine, and the proceedings were organized as a web site. The problem comes in that the HTML used spaces and forward slashes (/) in URLs for each page. Unfortunately, the URL "http://localhost/foo\Test Page.html" is not the same as "http://localhost/foo/Test%20Page.html". The first is not a valid URL, while the second is. This CD was tested with Internet Explorer, which automatically converts these invalid URLs into valid ones."

Complete Story