Introduction

Sygol is like any search engine: you digit the query, click the Search button and wait for results.

Note that queries are performed with OR logic, i.e. you will get all pages that contain at least one of the query words. Of course, the pages containing more search words are more likely to be listed first.

Furthermore, there are no special characters. Therefore + # & OR AND . (period) , (comma), etc. are treated like an integral part of the query. For example, +good +food will not look for pages containing both words but for pages that contain exactly  +good or +food. In the same way, quotation marks are not used to search for exact phrases and will be discarded.

You may search UNICODE strings, like Japanese and Korean.

 

Search options

The following filters may be applied to queries:

Domain List web pages coming from the specified first level domain only.
Internet List web pages found by the spiders. This is the classical search engine functionality.
Classifieds List classified ads inserted by users. As a matter of fact, Ads are spidered like any web page.
Files List files, i.e. those web entities with extensions like GIF, JPG, MPEG, PDF, etc. Sygol defines files all the web resources that do not resolve into a plain HTML file.
Synd List Syndicated pages, i.e. those pages whose link was found in RSS feeds.

Precision From High to Low:
  1. Search the exact words only.
  2. Also search word couples from left to right separeted by space. For example, if you digit a b c, Sygol will look for occurrences of a, b, c, a b and b c.
  3. Also search for the concatenation of all words. For example, if you digit a b c, Sygol will look for a, b, c, a b, b c, abc, a-b-c and a_b_c.
  4. Also search for the concatenation of word couples from left to right. For example, if you digit a b c, Sygol will look for all words from paragraphs 1, 2 and 3 plus ab, bc, a-b, b-c, a_b e b_c.

Note that 1, 2, 3 and 4 will yield the same results when you digit one word only. 3 and 4 will give the same results if you digit two words.

Language List pages written in the specified language only (language identification).

N.B. Version Alfa 1.0

 

The results

First part

First of all, if requested by the Sections option, you will see a list of directory sections pertaining to your query.

Next, the list of all words found is shown. The words are colored from very light blue to dark blue The lighter the blue,  the most obsolete the index for that word is. The darker the blue, the most up to date the index for that word is. A red word means that it's index is about to be udated.

By clicking on a word you can search for it on the spot.

You may force the update for a word's index by selecting force index update now. If you repeat the search right after, you will see the words going from red (index about to be updated) to dark blue (index updated). The amount of time requested for the update depends upon the update queue and from how common the words is. For example, updating www or http will take much longer than updating Nabucondosor.

Note that any word can have its index forcibly updated at most once a week.

To understand these concepts we must see how the spiders work. Needless to say, the spiders download web pages. Any new word that is found is stored once and for all in the words database while all words found in a page are stored in the pages database. A custom program will then take the word with the most obsolete index from the words database and extract all information necessary to quickly find the pages that contain the word in question from the pages database. This cycle is repeated endlessly and, since the words database contains more than 100,000,000 records, it could be months before a word's index is updated again even if, in the meanwhile, the spider has found more pages containing that word. By forcing the index update of a word, you will also see the recently spidered pages in the results of a query for that word.

(This is the way I cope with having only two PCs dedicated to the task).

Second part

In the second part of the results we see two columns

  1. The type of result, which may be:

    Right underneath we see (in red) the number of words found in the page among the words keyed by the user or derived from the various concatenations.

  2. The description of the result in a format that changes depending on the type of the result itself.