ScaraBook, a better delicious

19 July 2009 mysql, webdev

How to manage your bookmarks? I have used online services and applications to install on my server. Despite all, for convenience or laziness, I still go on a search engine. Is possible to make an equally convenient and fast application?

Why I’m not satisfied?

I tried del.icio.us, magnolia, furl, and various other online services. I installed on my server scuttle. I used freetag to write my application. They are all excellent, but ultimately I not use them because for my habits I found them inconvenient, both for research and for the input. I have spotted the problem in using the tags. What I want is a search box, type a few words and get relevant results.

full text search

In the end it was Google my bookmarks manager. I wanted to do something that would have the same immediacy of use. The problem was to index the data. Who uses the database knows that a search for substring (something like ... where field like %search% is inefficient. The solution came from just that MySQL has implemented index-type full-text. These indices can easily create a search engine in Google style.

Limitations

By default full-text index only works with at least 4 characters. To be able to index words shorter we need to change the parameter ft min word len in the configuration file of MySQL. The size of the index will be very influenced by this parameter. As an example, another my application that uses this index type with a minimum length of 3 letters, against a database of about 96MB (over 23,000 records) the index has grown to 71MB. The searches are very fast.

The full-text indexes can be created only on tables of type MyISAM and is not available on InnoDB tables.

Simplicity

The simplicity that you can get into (a single search box) is equal to that of the database. All bookmarks are stored in a single table. On this table will be constructed the full-text index for the research. All the additional tables and relationships necessary to implement common patterns used in solutions based on tags are then removed.

Syntax

By default, MySQL returns all results that contain at least one of search terms, sorted according to their relevance. This means that searching for “free image hosting” will return also records that contain only “hosting”, but the records that contain all three words will be sorted first. This means the top of the list of results contains the most relevant records.

You can refine searches using Boolean operators that allow very limited number of results. For example, using “+” require that the word must be present, while with the “-” the term must not be present. A boolean search “+image +free -hosting” requires records to be returned must contain both the words “image” and “free” but not contain the word “hosting”. The number of results will be much lower. This allows great flexibility in searches.

No paging

Why results pagination? If you do not get what you seek in the first 10 or 20 results, then you made the wrong search. The results of a search is not an album of photos to browse, they are a tool to do something without losing time. Google is forced to paginate the results because some searches will return too many results, but a database such as that of personal bookmarks will not return so many records. I prefer to have a page with 100 results rather that having to scroll through 10 pages of 10 results each. It is a personal preference, a choice of project and not a limitation. At most I could decide to limit the results to the first 20 or 50.

Bookmarklet

The ease of research must be coupled with an equally easy insertion of the new bookmark. This is why I made a bookmarklet that, like any other bookmark manager, allows the insertion of a new bookmark while surfing the web. If you highlight the text before using the bookmarklet, this text is automatically inserted in the description.

If the url we want to save is already present in the database then you will see the form for edit the record.

The beauty is that there is no need to manually enter the tags. The new record will be immediately indexed just after saved in the database and is possible to find it by searching words in the title or description. The inclusion of additional keywords is optional and only useful in those cases where we want the record to be linked with additional terms that are not already present in the title or description.

Tools used

The database used is MySQL since its full-text indexing is the basis of the functioning of ScaraBook.

The programming language is the “usual” PHP and I take this opportunity to test the framework CodeIgniter with an application that was not the usual “blog in 10 minutes.”

To export my bookmarks from del.icio.us I used MySQLicious.

Demo

There is a demo of ScaraBook freely accessible and usable, the bookmarklet has been enabled. Of course it is only a demo and the database will be periodically reset.

Due to hosting change the demo is not available.

References