Fun with regex substitutions in Apache

Continuing from my previous post, I was now able to access my AltaVista server, however from a web browser I was unable to actually view any of the documents remotely.

In the pages though I did get the MS-DOS path to the usenet article in question:

Now how do I turn that into a URL?

Well as it turns out mod_rewrite does support regex, which in turn can do variable re-ordering!

After a bit of googling I found this page on stackoverflow, on how to convert a date between UK/US formats:

s/(\d{4})-(\d{2})-(\d{2})/$1-$3-$2/

Simple, right?  So what is going on here?  The parenthesis define a variable set, and on the substitution part you can recall them with $1, $2 , $3 etc.  So using this recipe I could take something like this:

u:\b227\comp\sys\laptops\3080

and convert it into the following:

http://debian7/usenet/b227/comp/sys/laptops/3080

The code for this would look something like this:

Substitute "s|>u:.([a-z]{1,}[0-9]{3,})\\\([0-9a-z]{1,})\\\([0-9]{1,})|---><a href="\"http://debian7/usenet/$1/$2/$3\""]Click for article|"

Although for some reason it’s embedding the URL’s even though I specified code formatting.

Now all I had to do was install IIS 4.0 off the Option Pack CD-ROM, onto my Windows NT 4.0 workstation, and create a virtual directory of /usenet which then pointed to the U: drive where AltaVista did it’s indexing.

So to this point that gives me a config file much like this:

ServerAdmin webmaster@localhost
DocumentRoot /var/www
SSLProxyEngine On
ProxyPass "/altavista/" "https://10.12.0.16"
ProxyPassReverse "/altavista/" "https://10.12.0.16/"
ProxyRequests Off
RewriteEngine On

SetOutputFilter INFLATE;SUBSTITUTE;DEFLATE
AddOutputFilterByType SUBSTITUTE text/html
#clean up urls
Substitute "s|127.0.0.1:6688|debian7/altavista|n"
Substitute "s|file:///C:\Program Files\DIGITAL\AltaVista Search\My Computer\images\|http://debian7/images/|n"
#protect the page
Substitute "s|launch=app||n"
Substitute "s|?pg=config&what=init|?pg=h|n"
#fix title
Substitute "s|&lt;IMG src=\"http://debian7/images/av_personal.gif\" alt=\"[AltaVista] \"  BORDER=0 ALIGN=middle HEIGHT=72 VSPACE=0 HSPACE=0&gt;|&lt;a href=\"http://debian7/altavista\"&gt;&lt;IMG src=\"http://debian7/images/av_personal.gif\" alt=\"[AltaVista] \"  BORDER=0 ALIGN=middle HEIGHT=72 VSPACE=0 HSPACE=0&gt;<strong>|---&gt;|n"
Substitute "s|</strong>u:.([a-z]{1,}[0-9]{3,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9]{1,})|---&gt;&lt;a href=\"http://debian7/usenet/$1/$2/$3/$4/$5/$6/$7\"&gt;Click for article|"
Substitute "s|&gt;u:.([a-z]{1,}[0-9]{3,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9]{1,})|---&gt;&lt;a href=\"http://debian7/usenet/$1/$2/$3/$4/$5/$6\"&gt;Click for article|"
Substitute "s|&gt;u:.([a-z]{1,}[0-9]{3,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9]{1,})|---&gt;&lt;a href=\"http://debian7/usenet/$1/$2/$3/$4/$5\"&gt;Click for article|"
Substitute "s|&gt;u:.([a-z]{1,}[0-9]{3,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9]{1,})|---&gt;&lt;a href=\"http://debian7/usenet/$1/$2/$3/$4\"&gt;Click for article|"
Substitute "s|&gt;u:.([a-z]{1,}[0-9]{3,})\\\([0-9a-z]{1,})\\\([0-9]{1,})|---&gt;&lt;a href=\"http://debian7/usenet/$1/$2/$3\"&gt;Click for article|"
# Need links for the u:\news097f1\b120\comp\society\futures\1122
Substitute "s|&gt;u:.(news[0-9]{3,}f[0-9])\\\([b0-9]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([0-9]{1,})|---&gt;&lt;a href=\"http://debian7/usenet/$1/$2/$3/$4/$5/$6/$7/$8\"&gt;Click for article|"
Substitute "s|&gt;u:.(news[0-9]{3,}f[0-9])\\\([b0-9]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([0-9]{1,})|---&gt;&lt;a href=\"http://debian7/usenet/$1/$2/$3/$4/$5/$6/$7\"&gt;Click for article|"
Substitute "s|&gt;u:.(news[0-9]{3,}f[0-9])\\\([b0-9]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([0-9]{1,})|---&gt;&lt;a href=\"http://debian7/usenet/$1/$2/$3/$4/$5/$6\"&gt;Click for article|"
Substitute "s|&gt;u:.(news[0-9]{3,}f[0-9])\\\([b0-9]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([0-9]{1,})|---&gt;&lt;a href=\"http://debian7/usenet/$1/$2/$3/$4/$5\"&gt;Click for article|"
# Need links for  u:\news002f1\b1\fa.poli-sci\8
Substitute "s|&gt;u:.(news[0-9]{3,}f[0-9])\\\([b0-9]{1,})\\\([a-z\.\-]{1,})\\\([0-9]{1,})|---&gt;&lt;a href=\"http://debian7/usenet/$1/$2/$3/$4\"&gt;Click for article|"

&lt;Location /usenet/&gt;
    ProxyPass  http://10.12.0.16/usenet/
    RewriteEngine On
    SetOutputFilter INFLATE;SUBSTITUTE;DEFLATE
    AddOutputFilterByType SUBSTITUTE text/html
&lt;/Location&gt;

bla bla rest of the 000-default crap....

Simple right?

Searching for AltaVista

Searching for AltaVista

So now I get a nicely formatted page, I can click the mountain icon, and I jump back to home, and I can click on the articles and, because I have no extensions or MIME types to intercept it’ll just download them to my PC.  I guess I need to go through them all, convert them from UNIX format to MS-DOS, and stick a .txt extension on every single one of them.

I’m still thinking this thing is far too rickety to put on the internet, but we’ll see.

This entry was posted in apache, Digitial, interenet by neozeed. Bookmark the permalink.
avatar

About neozeed

What is there to tell? I've loved UNIX like things since I was first exposed to QNX in highschool (we had the Unisys ICONS!), and spent the better time of my teenage years trying to get my own UNIX... I should have bought Coherent in retrospect.. Anyways latched onto Linux in 1992, and then got some old BSD admin books and have been hooked on the VAX BSD & other big/ancient things since...!

2 thoughts on “Fun with regex substitutions in Apache

    • I tried adding a mime type for ‘*’ to map to text/plain but that didn’t help. I was looking and the easiest way out is to simply rename every file. I think I can do it via perl, maybe even convert them all to DOS text format too. And with the regex just append .txt to everything that is a link , and it should magically work.

Leave a Reply

Your email address will not be published.

Notify me of followup comments via e-mail. You can also subscribe without commenting.