RT is commonly used to manage a lot of data. Unfortunately, prior toRT 4, it didn't support effective indexing of the content of comments or correspondence. RT 4 makes use of the full-text indexing capabilities of PostgreSQL and Oracle, or the external Sphinx library for MySQL, to efficiently index this content, allowing for high-speed searching of textual content. Due to bugs in version 4.0.0, you'll need version 4.0.1 or higher to take advantage of the full-text indexing detailed below.
By default, with no full-text indexes enabled, RT 4 does not allow
searching of the "Content" field at all, owing to how resource-intensive
unindexed full-text searches are. If you wish to enable non-indexed
full-text searching, like previous versions of RT allowed, you will need
to add the following to your RT_SiteConfig.pm
:
Set(%FullTextSearch,
Enable => 1,
Indexed => 0,
);
The real magic comes from enabling your database's full-text indexes, however. For the section below, we'll be showing the configuration for PostgreSQL; we'll briefly address MySQL, which is slightly more complicated, at the end of this post.
If you intend to enable full-text indexed searches, you should also read
the docs/full_text_indexing.pod
file which shipped with your RT
install, which will provide roughly the same information as below, but
is additionally guaranteed to be up-to-date with your version of RT.
PostgreSQL configuration
To start off with, you'll need to configure your database to store the
index. On PostgreSQL, this consists of adding a new column and creating
an index on it. To do this, run rt-setup-fulltext-index
:
$ /opt/rt4/sbin/rt-setup-fulltext-index
Enter the name of a DB table that will be used to store the Pg tsvector.
You may either use the existing Attachments table, or create a new
table.
[Attachments]:
Enter the name of a column that will be used to store the Pg tsvector:
[ContentIndex]:
Generally, you'll want to store the new column in the Attachments table, right next to the data it is indexing. If you want to back up the index separately, or have other storage constraints, you may wish to create a new table to store it, but the default is almost certainly correct. Similarly, the default column name is also most likely correct.
You may choose between GiST or GIN indexes; the former is several times
slower to search, but takes less space on disk and is faster to update.
[GiST]:
You can read more about the differences between the index types in the PostgreSQL manual.
Going to run the following in the DB:
ALTER TABLE Attachments ADD COLUMN ContentIndex tsvector
Going to run the following in the DB:
CREATE INDEX ContentIndex_idx ON Attachments USING gist(ContentIndex)
You can now configure RT to use the newly-created full-text index by
adding the following to your RT_SiteConfig.pm:
Set( %FullTextSearch,
Enable => 1,
Indexed => 1,
Column => 'ContentIndex',
Table => 'Attachments',
);
At this point, your database has been configured. Ensure that you add
the Set(...)
block that rt-setup-fulltext-index
output (which may be
different from the above!) to your RT_SiteConfig.pm
before continuing.
Your index now exists, but is empty. In order to fill it, we must find
and index every textual attachment in RT using rt-fulltext-indexer
--all
. This can be a very time-consuming task, so be prepared for it
to take a while. It can be interrupted and resumed safely, however.
Once it has finished, you will need to schedule rt-fulltext-indexer
to
run at regular intervals to pick up new additions to the database. This
is easiest done by running ln -s /opt/rt4/sbin/rt-fulltext-indexer
/etc/cron.hourly
, which will ensure that it runs once an hour, but many
other alternatives are possible. Depending on how fast tickets are
created in your system, running it as frequently as once every two to
five minutes may be possible.
MySQL configuration
The story on MySQL is unfortunately considerably more complicated, as
MySQL does not natively support a full-text index format. It integrates
with the external Sphinx search library, but
this requires recompiling MySQL to include the SphinxSE engine which
allows MySQL queries to retrieve data from the Sphinx searchd
indexer.
While many vendors package the external Sphinx tools, none of them ships
a MySQL package which has been compiled with SphinxSE at the time of
writing. As such, this is most likely not for the faint of heart, or
those only just getting their feet wet with system administration.
Despite that, the process of recompiling MySQL with SphinxSE is fairly straightforward. Full instructions are provided in the Sphinx documentation. Sphinx 2.0.1 has been tested to work with both MySQL 5.0 and 5.1; it is not compatible with MySQL 5.5 at this time.
Once a SphinxSE-enabled MySQL has been installed, the configuration
proceeds roughly as for PostgreSQL, above. First, you must run
rt-setup-fulltext-index
, which will ensure that your MySQL is
correctly configured with SphinxSE, then prompt you for additional
information:
$ /opt/rt4/sbin/rt-setup-fulltext-index
Enter name of a new MySQL table that will be used to connect to the
Sphinx server:
[AttachmentsIndex]:
Enter URL of the sphinx search server; this should be of the form
sphinx://<server>:<port>/<index name>
[sphinx://localhost:3312/rt]:
In order to communicate with the Sphinx searchd
server, we create a
virtual SphinxSE table in the database. This table hence needs to know
where your Sphinx searchd
server is running. The default is usually
correct if you are planning to run searchd
on your database server on
the default port.
Maximum number of matches to return; this is the maximum number of
attachment records returned by the search, not the maximum number
of tickets. Both your RT_SiteConfig.pm and your sphinx.conf must
agree on this value. Larger values cause your Sphinx server to
consume more memory and CPU time per query.
[10000]:
This is an unfortunate limitation imposed by the design of Sphinx; 10000
should be sufficient for most purposes, but you may need to increment
this number if you have tickets with an extremely large number of
transactions which match your common search criteria. If you wish to
change this number later, you must alter it in both RT_SiteConfig.pm
and sphinx.conf
files.
Going to run the following in the DB:
CREATE TABLE AttachmentsIndex (
id INTEGER UNSIGNED NOT NULL,
weight INTEGER NOT NULL,
query VARCHAR(3072) NOT NULL,
INDEX(query)
) ENGINE=SPHINX CONNECTION="sphinx://localhost:3312/rt" CHARACTER SET utf8
You can now configure RT to use the newly-created full-text index by
adding the following to your RT_SiteConfig.pm:
Set( %FullTextSearch,
Enable => 1,
Indexed => 1,
Table => 'AttachmentsIndex',
MaxMatches => '10000',
);
It will also output a sample sphinx.conf
file. At this point, your
database has been configured. Ensure that you add the Set(...)
block
that rt-setup-fulltext-index
output (which may be different from the
above!) to your RT_SiteConfig.pm
before continuing.
Your index now exists, but is empty. In order to fill it, we must run the Sphinx indexer:
$ indexer rt
Sphinx 2.0.1-beta (r2792)
Copyright (c) 2001-2011, Andrew Aksyonoff
Copyright (c) 2008-2011, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file '/etc/sphinx.conf'...
indexing index 'rt'...
collected 133809 docs, 1015.6 MB
sorted 137.2 Mhits, 100.0% done
total 133809 docs, 1015645236 bytes
total 144.967 sec, 7006036 bytes/sec, 923.02 docs/sec
total 400 reads, 6.474 sec, 637.6 kb/call avg, 16.1 msec/call avg
total 582 writes, 2.473 sec, 959.3 kb/call avg, 4.2 msec/call avg
Finally, you will need to start the searchd
daemon:
$ searchd
Sphinx 2.0.1-beta (r2792)
Copyright (c) 2001-2011, Andrew Aksyonoff
Copyright (c) 2008-2011, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file '/etc/sphinx.conf'...
listening on all interfaces, port=3312
precaching index 'rt'
precached 1 indexes in 0.033 sec
You will need to re-index the data using indexer --rotate rt
at
regular intervals. Since Sphinx does not do incremental indexing, but
rather re-indexes all content each time it is run, a "main+delta"
split of the
data may be necessary to reduce index update time sufficiently to allow
close to real-time indexing.