Bluge supports a powerful framework for computing aggregated values over the set of
In Bluge, the aggregation framework relies heavily on two concepts
Bucket is simply a set of documents matching some criteria.
Metric is some value (or set of values) computed over a
There is one implicit bucket defined, which is the entire result set of your search.
Some aggregations (which we refer to as bucketing aggregations) define new sub-buckets inside this top-level bucket. These sub-buckets could either be staticly defined at search time, or dynamically defined based on the data.
Other aggregations (which we refer to as metric aggregations) compute values on buckets.
The terms aggregation typically operates on field data. Each term seen becomes it’s own bucket, and by default the count metric is applied to each bucket. Finally, at the conclusion of the search, these buckets are sorted by their counts descending, and the top N buckets are returned as part of the result.
For example, consider a set documents describing products. Each product has a keyword field named
category, indexed with the sortable option. When a user searches the products, we can compute a terms aggregation on the
category field, and display to the user the top 5 categories within their search results, and a count of how many products were in each category. This is often used as a way for users to drill deeper into the results, by refining their search filter interactively.
Numeric Range Aggregation
The numeric range aggregation also typically operates on field data. A query time a set of buckets is statically defined, which describe interesting numeric ranges. The aggregation by default includes the count metric, keeping track of how many documents had a numeric field value within the range.
Date Range Aggregation
The date range aggregation also typically operates on field data. A query time a set of buckets is statically defined, which describe interesting date ranges. The aggregation by default includes the count metric, keeping track of how many documents had a date time field value within the range.
The following basic single-value metrics are supported:
- weighted avg
A few special case aggregations are supported:
- count (sum of 1 per document)
- duration (time.Duration computed since the start of the search)
The cardinality estimation metric can be used to count the number of distinct values seen, in a memory efficient way.
The quantil approximation metric can be used to approximate quantiles in a memory efficient way.
Buckets and Metrics can be nested in arbitrary and powerful ways.
For example, imagine we have a set of documents describing beers. Each beer has a field named
style describing the style (lager, ale, lambic, etc). Each beer also has a numeric field named
abv describing the beer’s alcohol by volume. One could run a
MatchAll query across the beers, compute a
Terms Aggregation on the
style field, and then nest the
Quantile Approximation metric inside each of those buckets. The result would be that we could report the median (50th) and 99th percentile ABV for each different style of beer.
All the aggregations discussed thus far operate on extendable interfaces, not directly on field values.
This allows aggregations to work on custom values computed by your application, which can themselves use field value as inputs.
It also allows for filtering out undesirable values, or replacing missing values with alternates.
Extending the Framework
Calculator types used to define all of this functionality are exposed as interfaces, allowing your application the full power to define their own behavior.