MM3-WebAssistant

Search

Indexing

The search in the cache archives with the WebAssistant presupposes an indexation. It becomes indexed text and HTML files (pages). The algorithm of the Indexer, works essentially language independent. At this the corresponding lower case characters are always used for capital characters. The Latin and Russian alphabet as well as some special signs of European languages are supported.
Please, inform Tools, if you need another language.

Script file

You start the update with one of the following script files in the folder: MM3-WebAssistantProfessional/​script/

ScriptOperation System
MM3-Utility.bat Windows of Microsoft
MM3-Utility.sh Linux and UNIX
MM3-Utility.commandMac OS X of Apple

In the first dialog all utilities are displayed.

Start MM3-Indexer

Select: Update from archived pages
With Next you get the configuration dialog: Indexer

MM3-Indexer Configuration

Configuration of the Indexer

For the indexing you can set the following configuration:

  • Select the cache archive to be indexed
  • Specification of the minimal word length.
    Only words which have a minimal word length are included into the indexing. Simplified, this word length consists of the characters of a word.
  • Display of the positive and negative word list
    • Negative word list
      These words aren't included into the index.
      Stop words for English, Russian and German are existing.
      If you have created additional stop words, please inform us about these.
    • Positive word list
      These words are taken despite fall below the minimal word length.

    The corresponding files are in the files positive.*.txt and negative.*.txt of the folder MM3-WebAssistant​Professional/​config/search/. You can adapt the word lists to your need. The characters * stands for a language specific word list, e.g. en for the English and de for the German language. All files with a name structured correspondingly are used. We recommend for the identification of the language to use the abbreviations to ISO LanguageCode (ISO-639).

You start the indexation after you have done the settings. The needed duration is dependent on the size of the archives. The indexation can take up some time. Please, close the WebAssistant before indexation.

Log output

You can take from the output of the Indexer:

  • Indexed cache archives
  • Number of the file still to be indexed
  • At the moment indexed domain
  • Time needed till now
  • Progress bar
  • Summarizing statistics about the indexation

Starting with a command line

You also can start the Indexer with the following command line:

java -jar MM3-WebAssistant.jar Indexer cacheActive=D:\CacheArchiv\ minWordLength=2 withNumber=yes start

Out of Memory

The needed memory is dependent on the size of the archives and the chosen minimal word length. You can increase the available memory for the Indexer in the script file, if the indexation needs more memory. You can alternatively subdivide the cache archive into several archives or increase the minimal word length.