Indexing


Indexing


Config

To start working with a Bluge index, one always begins by creating the appropriate Config structure. To create a default config structure for working with an index stored on the filesystem use the following:

config := bluge.DefaultConfig(path)

Writers

Next, to modify an index with this config structure, we need to open a Writer, which can be done as follows:

writer, err := bluge.OpenWriter(config)
if err != nil {
	log.Fatalf("error opening writer: %v", err)
}

Writer’s hold an exclusive-lock on their underlying directory which prevents other processes from opening a writer while this one is still open. This does not affect Readers that are already open, and it does not prevent new Readers from being opened, but it does mean care care should be taken to close the Writer when you done:

defer func() {
	err = writer.Close()
	if err != nil {
		log.Fatalf("error closing writer: %v", err)
	}
}()

Now that we have an open Writer, we can add Documents to the index.

In Bluge, a Document is simply a collection of Fields. All Fields have a string identifier we call the field name. There is no requirement that Documents have Fields with any special names, but as a convention it is useful for Documents to have a common field named _id. To aid in following this convetion, a helper method exists to create documents with this field:

doc := bluge.NewDocument("a")

Typically, an application will add other fields to the document, here is a simple example adding a text field:

doc.AddField(bluge.NewTextField("name", "bluge")

Now our document is ready to be placed into the index. The most common way to update a document in the index using the Update method:

err = writer.Update(doc.ID(), doc)
if err != nil {
	log.Fatalf("error updating document: %v", err)
}

The Update method takes two arguments, the first is an identifier Term, and the second is a Document. The identifier Term tells Bluge how to identify which documents this will replace. The Document contains a helper method ID() to return the identifier used when calling NewDocument(id). In this example, we are updating the document identified by the field _id and value a. By using the Update method we ensure there is only ever one document with this identifier.

For advanced users an Insert method is offered which only takes the Document parameter, however this should only be used in cases where it is known that there is no existing Document with the same identifier.

Another important capability is to Delete documents from the index. We can delete the document we just updated using:

err = writer.Delete(doc.ID())
if err != nil {
	log.Fatalf("error deleting document: %v", err)
}

The Delete method takes just one parameter. Documents matching the identifying Term are removed.

Batches

In Bluge, higher throughput can be achieved by indexing Documents in larger batches. Batches also provide atomicity, as you are guaranteed that either all changes in a batch are applied together, or none of them are, and an error is returned.

To create a batch:

batch := bluge.NewBatch()

Batches offer the same basic operations as the Writer:

batch.Insert(doc)
batch.Update(doc.ID(), doc)
batch.Delete(doc)

When you are ready to execute a Batch:

err = indexWriter.Batch(batch)
if err != nil {
	log.Fatalf("error executing batch: %v", err)
}

It is important for applications to not operate on the same document multiple times in a batch. For example, one should not Update and Delete the same identifier in a single batch.

After a batch has completed execution (Batch() method has returned), a Batch can be reused by invoking the Reset() method:

batch.Reset()