Home| New Wiki | | Login | User registry | Home Tree PDF
Viajero
Owner:csilva, Version: 6, Date:Thu 15, March 2007,

Viajero is a Text Search Engine, developed to support this site (wikis) and to research a simple traversal for indexing.

  • Build a catalog (index file) of words and referencing the files on which they can be found.
  • Search expresions looking at the word list and thus getting sets of files where each term can be found.
  • Intersect (or union) the sets and return the URL list.

The code is very simple and it's separated in two java classes. One for indexing and searching and the other to support indexing by providing traversal of URLs that should be indexed.

Viajero can be used as a simple program or as a engine embeded into another application.

This sample call looks for the text java in wiki files inside the wikis folder and generates an index file contents.idx:

java -jar viajero.jar
     -e wiki
     -r wikis 
     -i contents.idx 
    java

Right now the only interesting improvements (for me) is to add the an option to rebuild the index file incrementally checking the modification date of already indexed files.

Listing 1: A sample code that uses the engine

Set exts = new HashSet();
    exts.add("java");
List terms = new ArrayList();
     terms.add("java");
     terms.add("search"};
File root = new File(".")

Motor   engine   = new Motor(); 
   new Viajero(root, engine, exts) . run(); 

Vector urls = engine.search( terms ); 

Listing 2: A sample index file

file:/C:/test/Hello.txt
file:/C:/test/Goodbye.txt

goodbye 1
hello 0
all 0,1

Download

The sources are licensed ASF 2.0, and can be downloaded here: /lib/viajero.jar.


Edit - History - Extract PDF - Extract Tree as PDF

Last Modified

Thu, Nov 19 Sat, Oct 3 Fri, Sep 25 Mon, Aug 3 Mon, Apr 27 Sat, Mar 28 Mon, Jan 19 Tue, Jan 6

Home| New Wiki