Search Engine for Intranet: Recommendations Needed

03-03-2011, 07:29 AM
We have an intranet that we would like to add a search engine to. This intranet is a massive combination of html pages, word documents, files of various types (.psd, .lwo, .ai, .png, etc.) Below are the features we'd like.

Searching for <jump> would return a list including the following:

folder containing jump.Lwo
folder containing JUMP.lwo
folder containing jump.psd
any html, htm, doc, txt, etc. file containing "jump" in the body
any html, htm, doc, txt, etc. file containing "jumps," "jumping," etc. in the body

It would be nice to search a massive texture directory for "brick" and have a list of all the brick images appear along with a thumbnail for easy browsing, but thumbnail generation is not necessary.

The site is updated infrequently, so I could run the index manually when I update the site, but if it ran on a timer that'd be fine as well.

So where can I get this for free :) ?

Seriously, I'm looking at TSEP right now, and will do some tests today with it. But I'm a novice (at best) when it comes to php and the like, so any advice would be helpful.


03-03-2011, 03:17 PM
IBM OmniFind Yahoo! Edition

- Free
- Linux RedHat or Windows Operating Systems
- works as designed
- Product is end-of-live
- max. 500,000 documents
- works for 200 file types
- no support

03-03-2011, 03:38 PM
Thanks Eagle66, I will look into that :)

03-03-2011, 04:26 PM
Google Search Appliance (http://www.google.com/enterprise/search/gsa.html) ... It's the only Intranet search engine even worth looking at.

03-03-2011, 05:37 PM
IBM OmniFind Yahoo! Edition
Clicking on Installation Guide returns an error- that does not bode well. I shall read more.

Google Search Appliance
This looks perfect, except for the price. It'd be easier to persuade my boss to let me put a search engine on our site if there was a free open source solution. Perhaps one gets what they pay for in this situation.

03-04-2011, 06:56 PM
I'm interesting in any easy to integrate tech too.

03-04-2011, 07:52 PM
Maybe Lucene by the Apache Foundation?

There are plugins to handle PDF, Word, etc. Free and pretty decent.

03-04-2011, 11:02 PM
Maybe Lucene by the Apache Foundation?
We had done a bit of research and even attempted to use Lucene during our first go-'round, but it still remains to be an extremely inferior design. The Apache page claims that it is considered to be "high performance", but we beg to differ. According to index times, it is over 120 times slower on index retrievals and quantity partitions. They are still using an old TF/IDF ranking system (Excite search engine used to use that ... remember them? .. Didn't think so..). So even when it found documents that you want, if your document shared any commonality with more than 10 other documents, the chances you will find it first is about 20%. So if you had 200 documents that shared rankings, you might find your document after going through over 160 of them first.

And once you got over 256GB of index data, it usually cratered. these days, that's not as much as you think it is.

So .. just an FYI in case you are still interested. There are a handful of solutions out there, and they all work fairly well at low volumes, but once you really start to amass some serious volumes... there are only 1 or 2 that will maintain an acceptable performance level.