iOS full-text search with Core Data and SQLite

24 January 2013

Wolfert de Kraker

by Wolfert de Kraker

This article will discuss the implementation of SQLite’s FTS engine in an Core Data-based iOS application.

Introduction

In a full-text search, FTS for short, the search engine examines all of the words in every stored document as it tries to match search criteria. Unfortunately, Core Data, iOS’s native persistence framework, does not offer support for FTS, however Core Data internally relies on an SQLite database for storage on iOS devices. This allows for the usage of SQLite’s FTS4 engine.

To implement this solution I wrote a small app with a simplified data structure similar to that of one of our client apps. For searching, this meant it should search through at least 10,000 documents on a title and content properties where the content would be filled with HTML-rich text. A dataset that closely resembled this is one from Stackoverflow, publicly available at http://blog.stackoverflow.com/category/cc-wiki-dump. I took the users.xml file. It contains about 300,000 users where the fields displayName and aboutMe(HTML rich-text) closely resemble the content in our client’s app.

The app displays a list of Stackoverflow user accounts in a tableView. The users are searchable through the tableView’s searchbar.

Global architecture

In the app I will implement FTS using SQLite’s FTS functionality. To simplify communication between the app and its database, FMDB will be used, FMDB is an Objective-C wrapper around SQLite. The virtual table is created in the same SQLite database in wich the Core Data content resides. To keep this table as light as possible only object properties relevant to the search query are inserted.

Search results are sorted on their relevance, which is calculated by SQLite in a custom ranking function. The custom function is added to SQLite using the provided C API. Objects in the resultset contain an identifier corresponding to the Core Data object. The relevant Core Data objects are fetched using an array containg the IDs retrieved by the FTS query. Results are displayed by a searchDisplayController in a tableView, ordered(DESC) based on their relevance ranking to the search query.

Global architecture

Pre-requisites:

To use the SQLite FTS engine it is necessary to include the SQLite library in the iOS project, and since there are no iOS specific versions of SQLite it’s necessary to add it as a static library and compile it. Here is how it can be done:

  • Download the SQLite source from http://www.sqlite.org/download.html

  • Open Xcode, create a new Project

  • Under the iOS section, select Framework & Library, create an new Cocoa Touch Static library.

  • Add the SQLite source files (sqlite3.c and sqlite3.h) to the project

  • In project build settings, make sure the compiler is set to the LLVM compiler 4.1

  • In project build settings, add the following LLVM preprocessor macros:

    • For debug: SQLITE_DEBUG, SQLITE_ENABLE_FTS3_PARENTHESIS and SQLITE_ENABLE_FTS3

    • For release: SQLITE_ENABLE_FTS3_PARENTHESIS and SQLITE_ENABLE_FTS3

  • Add this Xcode framework-project as a whole to the Xcode iOS app-project.

For ease of use I also included FMDB to interface with SQLite.

  • Download FMDB from github https://github.com/ccgus/fmdb

  • Add it to the iOS project in Xcode

  • Add its dependency, FMDB requires libsqlite3.dylib to be included with the project.

    • In project build phases under `Link binary with libraries'.

Implementation

This paragraph will describe the steps necessary to implement FTS. The virtual table should be created and filled simultaneously to the initial creation of the Core Data-database. When creating the virtual table specify that FTS is used:

BOOL tableCreated = [_db executeUpdate:@"CREATE VIRTUAL TABLE ZUSERINDEX USING fts4(displayName, aboutMe, id);"];

The _db variable is an instance of FMDatabase. For convenience this instance could be placed in an Singleton object so that is accessible throughout the app.

From the searchDisplayController delegate method shouldReloadTableForSearchString I trigger the custom search method which executes an FTS select query on the virtual table.

_result = [_db executeQuery:[NSString stringWithFormat:@"SELECT displayName, id FROM ZUSERINDEX WHERE ZUSERINDEX MATCH '*%@*' ORDER BY CustomRank(matchinfo(ZUSERINDEX)) DESC;", searchText]];

The function CustomRank is added to SQLite using following statement:

sqlite3_create_function(db, "CustomRank", 1, SQLITE_UTF8, NULL, &CustomRank, NULL, NULL);

The db variable is the sqlite handle, which is exposed as a property(sqliteHandle) by FMDB. Following arguments are the name of the function, the number of arguments it takes, and the function itself. Note that newer versions of FMDB also support the adding of functions via blocks, using makeFunctionNamed:maximumArguments:withBlock:.

The ranking function takes the matchinfo() results as argument. Matchinfo is an auxiliary function provided by SQLite which contains (amongst others), the column in which the phrase hit a match and the number of phrases:

static void CustomRank(sqlite3_context *context, int argc, sqlite3_value **argv) {
	if (argc != 1)
		sqlite3_result_null(context);
	if (sqlite3_value_type(argv[0]) == SQLITE_NULL)
		sqlite3_result_null(context);

	int weight = 0;
	unsigned int *blob = (unsigned int *)sqlite3_value_blob(argv[0]);

	unsigned int numberOfPhrases = blob[0];
	unsigned int numberOfColumns = blob[1];

	// Ranking details left out.

	sqlite3_result_int(context, weight);
}

For each matching phrase in each column the weight is increased. Our ranking function CustomRank is written in C. It is of course possible to write your custom function in Objective-C, but it gets a bit more tricky to convert the arguments to the right types.

This returns a FMDBResultset with the id column sorted by relevance as I defined it. I added the results in a NSArray.

FMResultSet *rs = [_database executeQuery:SQLFetchRequest];
_displayNameArray = [NSMutableArray array];
_idArray = [NSMutableArray array];
while ([rs next]) {
	[_displayNameArray addObject:[rs stringForColumn:@"displayname"]];
	[_idArray addObject:[rs stringForColumn:@"id"]];
}

The TableViewDatasource can be implemented to use these arrays as datasource and display their content in cells based on indexPath.row. When a cell is selected I query Core Data for the actual object based on the object id from _idArray using the following predicate:

[NSPredicate predicateWithFormat:@"identifier = %@", [_idArray objectAtIndex:indexPath.row]];

The result is a tableView with search option that responsively displays full-text search results in order of their relevance.

Performance

There are some performance issues in the implementation described above. This becomes apparent when the query is fired with only a single character. All documents may contain the character `e'. Calculating the ranks of the resultset from this query can take up to 1.5 seconds on older devices. A solution would be to not rank the results of single character queries, or only to fire the query after the second character was given.

The screenshots below show the perfomance gain over the Core Data queries:

FTS Match performance

FTS Match memory usage

SQLites Full-text search.

Coredata CONTAINS\[\cd performance]

Coredata CONTAINS\[cd] memory usage

Core Data with contains[cd] predicate.

97.8% CPU usage vs 87.9% CPU usage, an almost 10% decrease. Profiler results of memory usage see an allocation decrease from 131.56 to 19.85 MB, at least a 660% decrease. In practise this means that the app will remain responsive while searching.

Conclusion

Although this implementation may not be as elegant when compared to Core Data-based search implementions, this SQLite implementation offers something Core Data does not: full-text search. Next to that, it performs almost 10% faster and at least 660% more (memory) efficiently than a comparable Core Data query.

Further thoughts

Core Data could be left out of this implementation, SQLite offers comparable functionality, improved performance and FTS. This would mean that the tableViewDatasource would have to be adjusted to map the FMDB resultset rather than the CoreData fetchRequest resultset.