We have used Thinking-Sphinx in several projects to implement search engines. Thinking-Sphinx is a wrapper in Ruby to use ActiveRecord with Sphinx. But in the project I’m working right now I had to solve a different problem: the user should be able to search for formatted values, however without using the formatting characters, i.e, suppose a field containing 111.444.777-35, the user should be able to find it searching for 11144477735 (or a substring of it).

The default behaviour for Thinking-Sphinx is to index the exact values for each column it scans, but we can by-pass it. The Thinking-Sphinx index definition allows us to use raw SQL to define which values should be indexed for each attribute. In my case, I used the native PostgreSQL regular expressions functions to get rid of the punctuation characters, indexing only the “clean” version of each value. My database column is called “entity_number”:

define_index

Continue reading