pages tagged rewritemapmejo roaminghttps://blog.freesources.org//tags/rewritemap/mejo roamingikiwiki2016-09-17T15:52:59Zapache rewritemap querystringhttps://blog.freesources.org//posts/2016/09/apache_rewritemap_querystring/2016-09-17T15:52:59Z2016-09-17T15:52:59Z
<h1><em>Apache2</em>: Rewrite <em>REQUEST_URI</em> based on a bulk list of GET parameters in <em>QUERY_STRING</em></h1>
<p>Recently I searched for a solution to rewrite a <em>REQUEST_URI</em> based on <em>GET
parameters</em> in <em>QUERY_STRING</em>. To make it even more complicated, I had a
list of ~2000 parameters that have to be rewritten like the following:</p>
<pre><code>if %{QUERY_STRING} starts with one of <parameters>:
rewrite %{REQUEST_URI} from /new/ to /old/
</code></pre>
<p>Honestly, it took me several hours to find a solution that was satisfying and
scales well. Hopefully, this post will save time for others with the need for
a similar solution.</p>
<h2>Research and first attempt: <em>RewriteCond %{QUERY_STRING} ...</em></h2>
<p>After reading through some documentation, particularly
<em><a href="https://wiki.apache.org/httpd/RewriteQueryString">Manipulating the Query String</a></em>,
the following ideas came to my mind at first:</p>
<pre><code>RewriteCond %{REQUEST_URI} ^/new/
RewriteCond %{QUERY_STRING} ^(param1)(.*)$ [OR]
RewriteCond %{QUERY_STRING} ^(param2)(.*)$ [OR]
...
RewriteCond %{QUERY_STRING} ^(paramN)(.*)$
RewriteRule /new/ /old/?%1%2 [R,L]
</code></pre>
<p>or instead of an own RewriteCond for each parameter:</p>
<pre><code>RewriteCond %{QUERY_STRING} ^(param1|param2|...|paramN)(.*)$
</code></pre>
<h2 id="There_has_to_be_something_smarter_...">There has to be something smarter ...</h2>
<p>But with ~2000 parameters to look up, neither of the solutions seemed
particularly smart. Both scale really bad and probably it's rather heavy
stuff for Apache to check ~2000 conditions for every <em>^/new/</em> request.</p>
<p>Instead I was searching for a solution to lookup a string from a compiled
list of strings. <em>RewriteMap</em> seemed like it might be what I was searching
for. I read the Apache2 RewriteMap documentation
<a href="https://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewritemap">here</a>
and <a href="https://httpd.apache.org/docs/current/rewrite/rewritemap.html">here</a> and
finally found a solution that worked as expected, with one limitation. But
read on ...</p>
<h2>The solution: <em>RewriteMap</em> and <em>RewriteCond ${mapfile:%{QUERY_STRING}} ...</em></h2>
<p>Finally, the solution was to use a <em>RewriteMap</em> with all parameters that
shall be rewritten and check given parameters in the requests against this
map within a <em>RewriteCond</em>. If the parameter matches, the simple <em>RewriteRule</em>
applies.</p>
<p>For the inpatient, here's the rewrite magic from my VirtualHost configuration:</p>
<pre><code>RewriteEngine On
RewriteMap RewriteParams "dbm:/tmp/rewrite-params.map"
RewriteCond %{REQUEST_URI} ^/new/
RewriteCond ${RewriteParams:%{QUERY_STRING}|NOT_FOUND} !=NOT_FOUND
RewriteRule ^/new/ /old/ [R,L]
</code></pre>
<h3 id="A_more_detailed_description_of_the_solution">A more detailed description of the solution</h3>
<p>First, I created a <em>RewriteMap</em> at <em>/tmp/rewrite-params.txt</em> with all
parameters to be rewritten. A <em>RewriteMap</em> requires two field per
line, one with the origin and the other one with the replacement part.
Since I use the <em>RewriteMap</em> merely for checking the condition, not for
real string replacement, the second field doesn't matter to me. I ended
up putting my parameters in both fields, but you could choose every
random value for the second field:</p>
<p>/tmp/rewrite-params.txt:</p>
<pre><code>param1 param1
param2 param2
...
paramN paramN
</code></pre>
<p>Then I created a <em><a href="https://httpd.apache.org/docs/current/rewrite/rewritemap.html#dbm">DBM hash map file</a></em>
from that <em><a href="https://httpd.apache.org/docs/current/rewrite/rewritemap.html#txt">plain text map file</a></em>,
as DBM maps are indexed, while TXT maps are not. In other words: with big
maps, DBM is a huge performance boost:</p>
<pre><code>httxt2dbm -i /tmp/rewrite-params.txt -o /tmp/rewrite-params.map
</code></pre>
<hr />
<p>Now, let's go through the <em>VirtualHost configuration rewrite magic</em> from
above line by line. First line should be clear: it enables the <em>Apache
Rewrite Engine</em>:</p>
<pre><code>RewriteEngine On
</code></pre>
<p>Second line defines the <em>RewriteMap</em> that I created above. It contains the
list of parameters to be rewritten:</p>
<pre><code>RewriteMap RewriteParams "dbm:/tmp/rewrite-params.map"
</code></pre>
<p>The third line limits the rewrites to <em>REQUEST_URIs</em> that start with <em>/new/</em>.
This is particularly required to prevent rewrite loops. Without that
condition, queries that have been rewritten to <em>/old/</em> would go through the
rewrite again, resulting in an endless rewrite loop:</p>
<pre><code>RewriteCond %{REQUEST_URI} ^/new/
</code></pre>
<p>The fourth line is the core condition: it checks whether <em>QUERY_STRING</em> (the
GET parameters) is listed in the <em>RewriteMap</em>. A fallback value 'NOT_FOUND'
is defined if the lookup didn't match. The condition is only true, if the
lookup was successful and the <em>QUERY_STRING</em> was found within the map:</p>
<pre><code>RewriteCond ${RewriteParams:%{QUERY_STRING}|NOT_FOUND} !=NOT_FOUND
</code></pre>
<p>The last line is a simple <em>RewriteRule</em> from <em>/new/</em> to <em>/old/</em>. It is
executed only if all previous conditions are met. The
<em><a href="https://httpd.apache.org/docs/current/rewrite/flags.html">flags</a></em> are <em>R</em>
for <em>redirect</em> (issuing a HTTP redirect to browser) and <em>L</em> for <em>last</em>
(causing <em>mod_rewrite</em> to stop processing immediately after that rule):</p>
<pre><code>RewriteRule ^/new/ /old/ [R,L]
</code></pre>
<h2 id="Known_issues">Known issues</h2>
<p>A big limitation of this solution (compared to the ones above) is, that it
looks up the whole <em>QUERY_STRING</em> in <em>RewriteMap</em>. Therefore, it works only
if <em>param</em> is the only <em>GET parameter</em>. In case of additional GET parameters,
the second rewrite condition fails and nothing is rewritten even if the first
GET parameter is listed in <em>RewriteMap</em>.</p>
<p>If anyone comes up with a solution to this limitation, I would be glad to
learn about it :)</p>
<h2 id="Links">Links</h2>
<ul>
<li><a href="https://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewritemap">Apache Module mod_rewrite documentation</a></li>
<li><a href="https://httpd.apache.org/docs/current/rewrite/rewritemap.html">Apache Module mod_rewrite: Using RewriteMap</a></li>
<li><a href="https://wiki.apache.org/httpd/RewriteQueryString">Apache Httpd Wiki: Manipulating the Query String</a></li>
<li><a href="https://www.webmasterworld.com/apache/4730807.htm">WebmasterWorld.com Forum: Query strings in RewriteMap</a></li>
<li><a href="http://www.jeremytunnell.com/posts/mod_rewrite-attempting-to-bend-rewritemap-rewritecond-and-rewriterule-to-my-will">Jeremy Tunnell: Mod_rewrite: Attempting to bend RewriteMap, RewriteCond, and RewriteRule to my will...</a></li>
</ul>