Previous week, I started coding the admin page for Global Search. Here are the three indexing configurations that I’ve planned to implement:

  • Adding new documents. (This will be written such that the indexing is resumed from a previous run).

  • Deleting index.

  • Updating index for the updated records

For updating index pertaining to update/change in a record, solr gives us two options:

  • Treat the “updated” record as a whole new SolrDocument and re-index the complete document.

  • Perform a partial update by re-indexing only that field which was updated.

The first approach outlined above is pretty simple. The iterator will return a recordset having timemodified from a previous index run. And, those records will be accordingly re-indexed. [As implemented by my mentor Tomasz earlier. See wiki.

function mod_get_search_iterator($from = 0) {
  global $DB;
  $sql = "SELECT id, modified FROM {mod_table} WHERE modified >= ? ORDER BY modified ASC";
  return $DB->get_recordset_sql($sql, array($from));
}

The second approach was recently released by Solr. It could be very useful where thousands of documents may have been updated at once, and the first approach consumes a lot of time.

Lets, take an example. Suppose, we have 1000 books in Moodle stored in courseid : 1. The teacher/admin imports all the books to another course, say courseid : 2. So re-indexing all the 1000 books might not be very useful here. All we need to do is update only field: 'courseid' of all the books.

Solr supports several modifiers that atomically update values of a document.

set – set or replace a particular value, or remove the value if null is specified as the new value add – adds an additional value to a list inc – increments a numeric value by a specific amount

However, there’s no specific PHP approach of doing it but only XML and JSON. Hence, I will have to use SolrClient::request function to send a raw XML update request to the solr server. Here is a sample code of doing it in PHP.

$s = '';
$s.= '<add>';
for ($id = 1; $id <=1000; $id++){
	$s.= '<doc>';
	$s.= '<field name="id">' . $id . '</field>';
	$s.= '<field name="courseid" update="set">2</field>';
	$s.= '</doc>';
}
$s.= '</add>';

Followed by the following commands:

$client->request($s);
$client->commit();
$client->optimize();

One thing has to be kept in mind that the string above should be less than 2MB as defined in solrconfig.xml: multipartUploadLimitInKB="2048000". Running the above code resulted in a string of ~80KB, so we could easily use it for updating fileds in a large set of documents.

However, I’ve to discuss this second approach with my mentors which I will probably do this week on how to implement this in Global Search.