As you may remember from my prior attempt at using Altavista Search I ran out of space, and found out it only serves pages on 127.0.0.1:6688 and is pretty much hardcoded to do so. Â It’s a “fine” hyrid java 1.01 application, with the bulk of it being java. Â I finally got around to setting up a VM, and unpacking all of the utzoo archives, and indexing them. Â I should have done something about the IO because this took too long (KVM).
So to cheat the system, I installed stunnel as a simple https to http proxy, which let me access my search VM anywhere. Â However it still embedded 127.0.0.1 in all the pages.
Enter an Apache reverse proxy to talk to stunnel to talk to AltaVista search!
First to enable a few modules:
a2enmod substitute
a2enmod proxy
a2enmod ssl
a2enmod proxy_http
a2enmod rewrite
And adding this into the config:
SSLProxyEngine On
ProxyPass “/altavista/” “https://10.12.0.16”
ProxyPassReverse “/altavista/” “https://10.12.0.16/”
ProxyRequests Off
RewriteEngine On
SetOutputFilter INFLATE;SUBSTITUTE;DEFLATE
AddOutputFilterByType SUBSTITUTE text/html
Substitute “s/1997/2016/ni”
Substitute “s/97/16/ni”
Substitute “s|127.0.0.1:6688|debian7/altavista|n”
Substitute “s|file:///C:\Program Files\DIGITAL\AltaVista Search\My Computer\images\|http://debian7/images/|n”
Substitute “s|launch=app||n”
Substitute “s|<a href=http://debian7/altavista/?pg=q&what=0&fmt=d|<!—|n”
Substitute “s|><strong>|—>|n”
Substitute “s|</strong></a>||n”
Substitute “s|>u:\|->u:\|n”
This let me redirect all of those requests into a VM called debian7 on the /altavista path. Â I also copied the images to the apache server, and now I get something that looks correct!
I cut the results short… But here is a search of something simple:
I also killed all the ‘working URL’s that simply open a desktop application on the index ‘server’. Â Naturally it was a personal service, but as a server this isn’t any good. Â As such you can’t click on any search results now. Â I need something else to figure out how to take the result blocks like “u:\b128\comp\databases\2852” and turn them into URL’s.
Also, as much as I want to re-index I would be best to cut off the headers, or most of them so the preview lines make sense. Â Xref, Path, even From & Newsgroups don’t interest me.
I hate to leave it as ‘good enough’ but if anyone has a solution…. I’ll be glad to make this wonderful resource available!
Have you considered using the SWISH-E search engine? It’s open source, and in its day it seemed to be a pretty decent piece of software. It seems to be dead now, so I guess that should mean it meets your criteria for what software you use? 🙂
lol maybe. I’ve nearly gotten a “solution” for altavista, and I kind of like how this required middleware to work against it’s built in limitations.
I also found a linkbait site with all the utzoo stuff extracted. If I was smart I’d make a better one. It wouldn’t take tooooo much effort. Just time I suppose.
If you used SWISH-E, there would still be a lot of opportunities for writing a bunch of code 🙂 You should be able to write a “filter” or something so the engine gets given some metadata/properties like which newsgroup a post is from, and then provide a field for specifying that in the search form. It depends on what you find fun obviously 🙂