PDA

View Full Version : Search Engine for Intranet: Recommendations Needed



BlueApple
03-03-2011, 07:29 AM
We have an intranet that we would like to add a search engine to. This intranet is a massive combination of html pages, word documents, files of various types (.psd, .lwo, .ai, .png, etc.) Below are the features we'd like.

Searching for <jump> would return a list including the following:

folder containing jump.Lwo
folder containing JUMP.lwo
folder containing jump.psd
any html, htm, doc, txt, etc. file containing "jump" in the body
any html, htm, doc, txt, etc. file containing "jumps," "jumping," etc. in the body


It would be nice to search a massive texture directory for "brick" and have a list of all the brick images appear along with a thumbnail for easy browsing, but thumbnail generation is not necessary.

The site is updated infrequently, so I could run the index manually when I update the site, but if it ran on a timer that'd be fine as well.

So where can I get this for free :) ?

Seriously, I'm looking at TSEP right now, and will do some tests today with it. But I'm a novice (at best) when it comes to php and the like, so any advice would be helpful.

Gracias.

Eagle66
03-03-2011, 03:17 PM
IBM OmniFind Yahoo! Edition
http://omnifind.ibm.yahoo.net/productinfo.php

- Free
- Linux RedHat or Windows Operating Systems
- works as designed
- Product is end-of-live
- max. 500,000 documents
- works for 200 file types
- no support

:) :hey:

BlueApple
03-03-2011, 03:38 PM
Thanks Eagle66, I will look into that :)

Hopper
03-03-2011, 04:26 PM
Google Search Appliance (http://www.google.com/enterprise/search/gsa.html) ... It's the only Intranet search engine even worth looking at.

BlueApple
03-03-2011, 05:37 PM
IBM OmniFind Yahoo! Edition
Clicking on Installation Guide returns an error- that does not bode well. I shall read more.

Google Search Appliance
This looks perfect, except for the price. It'd be easier to persuade my boss to let me put a search engine on our site if there was a free open source solution. Perhaps one gets what they pay for in this situation.

Matt
03-04-2011, 06:56 PM
I'm interesting in any easy to integrate tech too.

Pixelthekid
03-04-2011, 07:52 PM
Maybe Lucene by the Apache Foundation?
http://lucene.apache.org/java/docs/index.html

There are plugins to handle PDF, Word, etc. Free and pretty decent.

Hopper
03-04-2011, 11:02 PM
Maybe Lucene by the Apache Foundation?
http://lucene.apache.org/java/docs/index.html
We had done a bit of research and even attempted to use Lucene during our first go-'round, but it still remains to be an extremely inferior design. The Apache page claims that it is considered to be "high performance", but we beg to differ. According to index times, it is over 120 times slower on index retrievals and quantity partitions. They are still using an old TF/IDF ranking system (Excite search engine used to use that ... remember them? .. Didn't think so..). So even when it found documents that you want, if your document shared any commonality with more than 10 other documents, the chances you will find it first is about 20%. So if you had 200 documents that shared rankings, you might find your document after going through over 160 of them first.

And once you got over 256GB of index data, it usually cratered. these days, that's not as much as you think it is.

So .. just an FYI in case you are still interested. There are a handful of solutions out there, and they all work fairly well at low volumes, but once you really start to amass some serious volumes... there are only 1 or 2 that will maintain an acceptable performance level.