|
Viajero is a Text Search Engine, developed to support this site (wikis) and to research a simple traversal for indexing.
- Build a catalog (index file) of words and referencing the files on which they can be found.
- Search expresions looking at the word list and thus getting sets of files where each term can be found.
- Intersect (or union) the sets and return the URL list.
The code is very simple and it's separated in two java classes. One for indexing and searching and the other to support indexing by providing traversal of URLs that should be indexed.
Viajero can be used as a simple program or as a engine embeded into another application.
This sample call looks for the text java in wiki files inside the wikis folder and generates an index file contents.idx:
java -jar viajero.jar
-e wiki
-r wikis
-i contents.idx
java
Right now the only interesting improvements (for me) is to add the an option to rebuild the index file incrementally checking the modification date of already indexed files.
Listing 1: A sample code that uses the engine
Set exts = new HashSet();
exts.add("java");
List terms = new ArrayList();
terms.add("java");
terms.add("search"};
File root = new File(".")
Motor engine = new Motor();
new Viajero(root, engine, exts) . run();
Vector urls = engine.search( terms );
Listing 2: A sample index file
file:/C:/test/Hello.txt
file:/C:/test/Goodbye.txt
goodbye 1
hello 0
all 0,1
Download
The sources are licensed ASF 2.0, and can be downloaded here: /lib/viajero.jar.
|