Field relevance
The search is performed only within the product fields of your choice. A product field is a container of text stored in your database. Product fields may contain keywords, useful to identify products. For example, if the field Product name contains the word 'iPhone', probably the related item is an iPhone or maybe just a cover or a charger for iPhone. If your customer wants an iPhone and not a cover for iPhone, the product iPhone must be displayed before any iPhone accessory. This is where the field relevance comes into play.
When you perform a search operation, Advanced Smart Search produces a result set that includes items matching the search query and, for each matching item, a score.
The score is a number that is calculated based on the position of each product field in the list Field Relevance and on other statistical information (see below). The total relevance of a returned search item is determined based on its score compared with other scores in the result set, items with higher scores are deemed to be more relevant to the search.
By default, search results are returned in relevance order, so changing the field positions in the list Field Relevance can change the scores and then the order in which search results are returned.
To set up the field relevance, select the checkboxes for the fields which contain the keywords useful to identify the product (for example, Product name, Product tags and Model) and choose which fields are more relevant for the search by moving them up or down in the list (move the mouse on a field and when you see the four headed arrow cursor, drag and drop the field in the desired position).
Example
In our example, the word 'iPhone' may be present in all or just in some of the three fields.
If the search returns these three products:
Product Name |
Model |
Tags |
Cover for iPhone 5s |
cover iP5s |
cover, iPhone, 5s |
Apple iPhone 5s |
iPhone 5s |
Apple, mobiles |
Two covers for iPhone 5s blue and green |
iP5srg |
covers |
The field order:
returns the sorted list:
Search string |
Results |
Matches |
iphone |
1Apple iPhone 5s |
Product name, Model |
2Cover for iPhone 5s |
Product name, Tags |
3Two covers for iPhone 5s blue and green |
Product name |
while the field order:
returns the list:
Search string |
Results |
Matches |
iphone |
1Cover for iPhone 5s |
Tags, Product name |
2Apple iPhone 5s |
Product name, Model |
3Two covers for iPhone 5s blue and green |
Product name |
To further improve the field relevance system, Advanced Smart Search takes into account other parameters to determine the position of a product in the result set. They include:
- The number of words found for each product field.
- The position of each word into the text. Words in sequence that match phrases entered by customers will carry more weight than words not in sequence. For example, if a customer searches for 'iPhone 5s', a product with the keywords 'iPhone' and '5s' next to each other will have a higher relevancy score than a product containing the string: 'iPhone cover for 5s models'.
- Type of words found. Words can be (in order of relevancy score): exact words, plurals/singulars, misspellings. if a customer searches for 'iPhone', products with the keyword 'iPhone' will rank higher than products with the plural 'iPhones' or the misspelling 'iPhome'.
Tips to improve search speed
- To speed up the database queries, select the minimum number of fields necessary to perform a good search. Also, enable the database table indexing.
- If the cache manager is enabled, only the first database query takes a certain amount of time, all the subsequent search results for the same keywords will be instantly returned to the clients.
- Optimize fields with a small amount of text like name, tags, ecc. instead to enable large product fields like description, it will speed up searches.
- Optimize texts for the fields meta tag description and meta tag keyword, useful to boost up pages in the external Search Engine Result Pages (SERPs) and also helpful to display relevant results in the Live Search.
Partial word matches
This option extends results to products containing words partially matching a search term.
While, in some cases, this option can really return more relevant results, it could also include results that have nothing to do with the search query. On large databases, is always recommended to use this option together with the Fast algorithm. If the search algorithm is set on Default and you enable this option on large databases, it might slow down SQL queries.
When to enable this option
Enable this option if your product main keywords are compound words (words made up of two or more words, like, booklet, breadknife) and where single words (book, knife) might be used for searches in place of the main keywords.
You must be somewhat careful with this option as leaving your configuration too wide open can cause a lot of false positive results.
Differences between the two search algorithms
Partial word searches are performed by the two search algorithms Defalut and Fast in two slightly different ways.
Fast algorithm:
- is able to find "words starting with";
- the minimum partial word length depends on the value of the variable ft_min_word_len, that can be changed on VPS/Dedicated servers only;
- Search speed is not affected when this option is enabled.
Default algorithm:
- It can find "words starting with" and also "words inside" words;
- The minimum partial word length doesn't depend on any system variable, even though is always recommended to set up a partial word length greater than 3 or 4, to avoid a lot of false positive results and slow queries.
- Search speed can become slow on large databases.
This table gives a visual explaination of the difference:
Search Algorithm |
Search string |
Matches |
Default |
phone |
telephones,
microphone,
phonebook
|
Default |
book |
notebook,
bookshelf,
booklet
|
Fast * |
phone |
phonebook
|
Fast * |
book |
bookshelf,
booklet
|
* ft_min_word_len = 3 |
Tips
If you have products whose main keywords are compound words, rather than enabling the partial word matching, optimize product field contents. For example, if the main product keyword is 'bookshelf', enable fields like meta description, meta keywords or tags and add the words 'book' and 'shelf' in their contents. It will also help product pages to have a better ranking on Google.
Database optimization
On large databases, search operations can become very slow without indexing tables. On small-medium databases also, some configurations may slow down the server response time and that could negatively impact features requiring a high responsiveness (like the Live Search).
The next paragraphs explain how to setup your MySQL database and server to speed up query response times.
Checkbox "Index database tables"
If you are not familiar with databases, think of of a database index as an index in a book. If you have a book about dogs and you are looking for the section on Golden Retrievers, then why would you flip through the entire book – which is the equivalent of a full table scan in database terminology – when you can just go to the index at the back of the book, which will tell you the exact pages where you can find information on Golden Retrievers. Similarly, as a book index contains a page number, a database index contains a pointer to the row containing the value that you are searching for in your SQL.
A database index is a data structure that improves search operations on a database table. Indexes are used to quickly locate data without having to search every row in a database table every time the table is accessed. Indexes can be created using one or more columns of a database table, they are a copy of these columns of data and they can be searched very efficiently.
The downside of indexes is that they require additional space on the disk to maintain the extra copy of data, so the larger a table is, the larger indexes related to that table will be.
Another performance hit with indexes is the fact that whenever you add, delete, or update rows in the corresponding table (for example when inserting/updating/deleting product pages), the same operations will have to be done to your index. Remember that an index needs to contain the same up to the minute data as whatever is in the table column(s) that the index covers.
Index database tables if you need faster searches and if you don't add/update products too often. To create the FULL TEXT indexes, just select the checkbox Index database tables and click on Save.
When tables are indexed, the search engine behaviour is slight different when searches includes words with a length lower than a certain value and stop words. If searches seem slow or give inaccurate results for some common words such as "the" or "and", even when FULL TEXT indexes are enabled, it may be related to the MySQL variables ft_min_word_len and ft_stopword_file. This topic is fully explained in the pharagraph Fine tuning MySQL configuration.
Fine tuning MySQL configuration
If your website is hosted on a Virtual Private Server (VPS) or on a Dedicated server and you have a root access, you can change settings in the configuration file.
It is strongly recommended to backup your MySQL database before making any changes to your database. We are not responsible of any data loss.
On Shared servers the following settings cannot be changed. For more info contact your hosting provider.
There are two MySQL variables that you might want to modify to alter the default search behaviour, ft_min_word_len and ft_stopword_file. They can help to provide more accurate and faster results.
On Linux, Unix and Mac, the configuration file that contains these variables is my.cnf, on Windows is my.ini, refer to this page https://dev.mysql.com/doc/refman/5.1/en/option-files.html to find the exact file location.
Indexing and FULL TEXT minimum word length
When tables are indexed, words having a length lower than the value set in the MySQL variable ft_min_word_len cannot be searched by the fast algorithm. For those words searches can be only performed by the default algorithm.
While the fast algorithm is optimized to use only FULL TEXT indexes and is bound to the minimum word length limit set by the variable ft_min_word_len, the default algorithm can bypass this limit at the cost of slower queries because it has to search every row in the database, losing the advantages of FULL TEXT indexes.
On small databases and on websites hosted on shared servers (on which the value of the variable ft_min_word_len cannot be changed), the extra time necessary to scan all rows is trascurable, so both the default and fast algorithms can be used without losing much performance.
On large databases is always recommended to use the fast algorithm. If you try to use the default algorithm on big databases and you also enable the partial word matching option, queries can become slow, expecially when including very common words.
Below you will find the instructions on how to setup the variable ft_min_word_len, if you never did it before or you don't want to mess up your MySQL configuration files, contact your hosting provider.
The variable ft_min_word_len can be found within the section [mysqld] of your my.cnf/my.ini file. If there is no variable with that name within the [mysqld] section, add it as shown in the example below.
My.cnf configuration example
[mysqld]
port= 3306
...
log_error="mysql_error.log"
#bind-address="127.0.0.1"
ft_stopword_file = "/path/to/your/stopword/file.txt"
ft_min_word_len = 2
After applying the changes, restart the server and rebuild your FULL TEXT indexes by clicking on the button Rebuild indexes.
FULL TEXT indexes and stop words
When someone makes a search, they often type phrases containing words that show up very frequently in pages and have little to do with the information being sought. Most people don't usually search these words anyway, even if they do include them in their queries.
These common words (such as "and" or "the") are called stop words and, for the reasons cited above, should be excluded from searches.
MySQL Server automatically excludes stop words from indexes, then if tables are indexed, stop words do not match if present in the search string. The primary reason for not indexing stop words is to provide more accurate results and increase the search speed.
MySQL uses a built-in stopword list (in English language), it can be found here:
http://dev.mysql.com/doc/refman/5.5/en/fulltext-stopwords.html
You can also define your custom stop words, for example if your website language is different from English or if you want to add/remove stop words from the built-in list. Here is a good list in several languages:
http://www.ranks.nl/stopwords
To override the default stopword list, set the MySQL variable ft_stopword_file. You can find it within the section [mysqld] of your my.cnf/my.ini file. If there is no variable with that name within the [mysqld] section, add it as shown in the example below. The variable value can be:
- the path name of the file containing the stopword list (the server looks for the file in the data directory unless an absolute path name is given to specify a different directory);
- An empty string, which will disable stopword filtering.
If no variable is found, the MySQL will use the built-in stopword list.
After changing the value of this variable or the contents of the stopword file, restart the server and rebuild your FULL TEXT indexes by clicking on the button Rebuild indexes.
My.cnf configuration example
[mysqld]
port= 3306
...
log_error="mysql_error.log"
#bind-address="127.0.0.1"
ft_min_word_len = 4
ft_stopword_file = "/path/to/your/stopword/file.txt"
Button "Rebuild indexes"
Every time you modify your MySQL configuration files, always remember to: 1) restart the server, 2) rebuild Indexes. To know how to restart the server, contact your hosting provider. To rebuild indexes, just click on the button Rebuild indexes. On large tables, this operation could take several time to complete.
If you don't make any changes to your configuration files, there is no need to rebuild indexes. When you just add/modify/remove products from your store, indexes are automatically kept in sync with the tables they refer to.