Apache2: Rewrite REQUEST_URI based on a bulk list of GET parameters in QUERY_STRING
Recently I searched for a solution to rewrite a REQUEST_URI based on GET parameters in QUERY_STRING. To make it even more complicated, I had a list of ~2000 parameters that have to be rewritten like the following:
if %{QUERY_STRING} starts with one of <parameters>:
rewrite %{REQUEST_URI} from /new/ to /old/
Honestly, it took me several hours to find a solution that was satisfying and scales well. Hopefully, this post will save time for others with the need for a similar solution.
Research and first attempt: RewriteCond %{QUERY_STRING} ...
After reading through some documentation, particularly Manipulating the Query String, the following ideas came to my mind at first:
RewriteCond %{REQUEST_URI} ^/new/
RewriteCond %{QUERY_STRING} ^(param1)(.*)$ [OR]
RewriteCond %{QUERY_STRING} ^(param2)(.*)$ [OR]
...
RewriteCond %{QUERY_STRING} ^(paramN)(.*)$
RewriteRule /new/ /old/?%1%2 [R,L]
or instead of an own RewriteCond for each parameter:
RewriteCond %{QUERY_STRING} ^(param1|param2|...|paramN)(.*)$
There has to be something smarter ...
But with ~2000 parameters to look up, neither of the solutions seemed particularly smart. Both scale really bad and probably it's rather heavy stuff for Apache to check ~2000 conditions for every ^/new/ request.
Instead I was searching for a solution to lookup a string from a compiled list of strings. RewriteMap seemed like it might be what I was searching for. I read the Apache2 RewriteMap documentation here and here and finally found a solution that worked as expected, with one limitation. But read on ...
The solution: RewriteMap and RewriteCond ${mapfile:%{QUERY_STRING}} ...
Finally, the solution was to use a RewriteMap with all parameters that shall be rewritten and check given parameters in the requests against this map within a RewriteCond. If the parameter matches, the simple RewriteRule applies.
For the inpatient, here's the rewrite magic from my VirtualHost configuration:
RewriteEngine On
RewriteMap RewriteParams "dbm:/tmp/rewrite-params.map"
RewriteCond %{REQUEST_URI} ^/new/
RewriteCond ${RewriteParams:%{QUERY_STRING}|NOT_FOUND} !=NOT_FOUND
RewriteRule ^/new/ /old/ [R,L]
A more detailed description of the solution
First, I created a RewriteMap at /tmp/rewrite-params.txt with all parameters to be rewritten. A RewriteMap requires two field per line, one with the origin and the other one with the replacement part. Since I use the RewriteMap merely for checking the condition, not for real string replacement, the second field doesn't matter to me. I ended up putting my parameters in both fields, but you could choose every random value for the second field:
/tmp/rewrite-params.txt:
param1 param1
param2 param2
...
paramN paramN
Then I created a DBM hash map file from that plain text map file, as DBM maps are indexed, while TXT maps are not. In other words: with big maps, DBM is a huge performance boost:
httxt2dbm -i /tmp/rewrite-params.txt -o /tmp/rewrite-params.map
Now, let's go through the VirtualHost configuration rewrite magic from above line by line. First line should be clear: it enables the Apache Rewrite Engine:
RewriteEngine On
Second line defines the RewriteMap that I created above. It contains the list of parameters to be rewritten:
RewriteMap RewriteParams "dbm:/tmp/rewrite-params.map"
The third line limits the rewrites to REQUEST_URIs that start with /new/. This is particularly required to prevent rewrite loops. Without that condition, queries that have been rewritten to /old/ would go through the rewrite again, resulting in an endless rewrite loop:
RewriteCond %{REQUEST_URI} ^/new/
The fourth line is the core condition: it checks whether QUERY_STRING (the GET parameters) is listed in the RewriteMap. A fallback value 'NOT_FOUND' is defined if the lookup didn't match. The condition is only true, if the lookup was successful and the QUERY_STRING was found within the map:
RewriteCond ${RewriteParams:%{QUERY_STRING}|NOT_FOUND} !=NOT_FOUND
The last line is a simple RewriteRule from /new/ to /old/. It is executed only if all previous conditions are met. The flags are R for redirect (issuing a HTTP redirect to browser) and L for last (causing mod_rewrite to stop processing immediately after that rule):
RewriteRule ^/new/ /old/ [R,L]
Known issues
A big limitation of this solution (compared to the ones above) is, that it looks up the whole QUERY_STRING in RewriteMap. Therefore, it works only if param is the only GET parameter. In case of additional GET parameters, the second rewrite condition fails and nothing is rewritten even if the first GET parameter is listed in RewriteMap.
If anyone comes up with a solution to this limitation, I would be glad to learn about it :)