Home| New Wiki | | Login | User registry | Home Tree PDF
Viajero
Owner:csilva, Version: 6, Date:Thu 15, March 2007,

Viajero is a Text Search Engine, developed to support this site (wikis) and to research a simple traversal for indexing.

  • Build a catalog (index file) of words and referencing the files on which they can be found.
  • Search expresions looking at the word list and thus getting sets of files where each term can be found.
  • Intersect (or union) the sets and return the URL list.

The code is very simple and it's separated in two java classes. One for indexing and searching and the other to support indexing by providing traversal of URLs that should be indexed.

Viajero can be used as a simple program or as a engine embeded into another application.

This sample call looks for the text java in wiki files inside the wikis folder and generates an index file contents.idx:

java -jar viajero.jar
     -e wiki
     -r wikis 
     -i contents.idx 
    java

Right now the only interesting improvements (for me) is to add the an option to rebuild the index file incrementally checking the modification date of already indexed files.

Listing 1: A sample code that uses the engine

Set exts = new HashSet();
    exts.add("java");
List terms = new ArrayList();
     terms.add("java");
     terms.add("search"};
File root = new File(".")

Motor   engine   = new Motor(); 
   new Viajero(root, engine, exts) . run(); 

Vector urls = engine.search( terms ); 

Listing 2: A sample index file

file:/C:/test/Hello.txt
file:/C:/test/Goodbye.txt

goodbye 1
hello 0
all 0,1

Download

The sources are licensed ASF 2.0, and can be downloaded here: /lib/viajero.jar.


Edit - History - Extract PDF - Extract Tree as PDF

Last Modified

Sun, May 18 Mon, May 12 Mon, Apr 28 Sun, Apr 13 Mon, Mar 24 Sat, Feb 23 Tue, Feb 19 Sat, Feb 16 Wed, Feb 13

Home| New Wiki