|
@@ -0,0 +1,179 @@
|
|
|
+
|
|
|
+=== Sorting multi-value buckets
|
|
|
+
|
|
|
+Multi-value buckets -- like the `terms`, `histogram` and `date_histogram` --
|
|
|
+dynamically produce many buckets. How does Elasticsearch decide what order
|
|
|
+these buckets are presented to the user?
|
|
|
+
|
|
|
+By default, buckets are ordered by `doc_count` in descending order. This is a
|
|
|
+good default because often we want to find the documents that maximize some
|
|
|
+criteria: price, population, frequency.
|
|
|
+
|
|
|
+But sometimes you'll want to modify this sort order, and there are a few ways to
|
|
|
+do it depending on the bucket.
|
|
|
+
|
|
|
+==== Intrinsic sorts
|
|
|
+
|
|
|
+These sort modes are "intrinsic" to the bucket...they operate on data that bucket
|
|
|
+generates such as `doc_count`. They share the same syntax but differ slightly
|
|
|
+depending on the bucket being used.
|
|
|
+
|
|
|
+Let's perform a `terms` aggregation but sort by `doc_count` ascending:
|
|
|
+
|
|
|
+[source,js]
|
|
|
+--------------------------------------------------
|
|
|
+GET /cars/transactions/_search?search_type=count
|
|
|
+{
|
|
|
+ "aggs" : {
|
|
|
+ "colors" : {
|
|
|
+ "terms" : {
|
|
|
+ "field" : "color",
|
|
|
+ "order": {
|
|
|
+ "_count" : "asc" <1>
|
|
|
+ }
|
|
|
+ }
|
|
|
+ }
|
|
|
+ }
|
|
|
+}
|
|
|
+--------------------------------------------------
|
|
|
+// SENSE: 300_Aggregations/50_sorting_ordering.json
|
|
|
+<1> Using the `_count` keyword, we can sort by `doc_count` ascending
|
|
|
+
|
|
|
+We introduce a "order" object into the aggregation, which allows us to sort on
|
|
|
+one of several values:
|
|
|
+
|
|
|
+- `_count`: Sort by document count. Works with `terms`, `histogram`, `date_histogram`
|
|
|
+- `_term`: Sort by the string value of a term alphabetically. Works only with `terms`
|
|
|
+- `_key`: Sort by the numeric value of each bucket's key (conceptually similar to `_term`).
|
|
|
+Works only with `histogram` and `date_histogram`
|
|
|
+
|
|
|
+==== Sorting by a metric
|
|
|
+
|
|
|
+Often, you'll find yourself wanting to sort based on a metric's calculated value.
|
|
|
+For our car sales analytics dashboard, we may want to build a bar chart of
|
|
|
+sales by car color, but order the bars by the average price ascending.
|
|
|
+
|
|
|
+We can do this by adding a metric to our bucket, then referencing that
|
|
|
+metric from the "order" parameter:
|
|
|
+
|
|
|
+[source,js]
|
|
|
+--------------------------------------------------
|
|
|
+GET /cars/transactions/_search?search_type=count
|
|
|
+{
|
|
|
+ "aggs" : {
|
|
|
+ "colors" : {
|
|
|
+ "terms" : {
|
|
|
+ "field" : "color",
|
|
|
+ "order": {
|
|
|
+ "avg_price" : "asc" <2>
|
|
|
+ }
|
|
|
+ },
|
|
|
+ "aggs": {
|
|
|
+ "avg_price": {
|
|
|
+ "avg": {"field": "price"} <1>
|
|
|
+ }
|
|
|
+ }
|
|
|
+ }
|
|
|
+ }
|
|
|
+}
|
|
|
+--------------------------------------------------
|
|
|
+// SENSE: 300_Aggregations/50_sorting_ordering.json
|
|
|
+<1> The average price is calculated for each bucket
|
|
|
+<2> Then the buckets are ordered by the calculated average in ascending order
|
|
|
+
|
|
|
+This lets you over-ride the sort order with any metric, simply by referencing
|
|
|
+the name of the metric. Some metrics, however, emit multiple values. The
|
|
|
+`extended_stats` metric is a good example: it provides half a dozen individual
|
|
|
+metrics.
|
|
|
+
|
|
|
+[INFO]
|
|
|
+.Applicable buckets
|
|
|
+====
|
|
|
+Metric-based sorting works with `terms`, `histogram` and `date_histogram`
|
|
|
+====
|
|
|
+
|
|
|
+If you want to sort on a multi-value metric, you just need to use the fully-qualified
|
|
|
+dot path:
|
|
|
+
|
|
|
+[source,js]
|
|
|
+--------------------------------------------------
|
|
|
+GET /cars/transactions/_search?search_type=count
|
|
|
+{
|
|
|
+ "aggs" : {
|
|
|
+ "colors" : {
|
|
|
+ "terms" : {
|
|
|
+ "field" : "color",
|
|
|
+ "order": {
|
|
|
+ "stats.variance" : "asc" <1>
|
|
|
+ }
|
|
|
+ },
|
|
|
+ "aggs": {
|
|
|
+ "stats": {
|
|
|
+ "extended_stats": {"field": "price"}
|
|
|
+ }
|
|
|
+ }
|
|
|
+ }
|
|
|
+ }
|
|
|
+}
|
|
|
+--------------------------------------------------
|
|
|
+// SENSE: 300_Aggregations/50_sorting_ordering.json
|
|
|
+<1> Using dot notation, we can sort on the metric we are interested in
|
|
|
+
|
|
|
+In this example we are sorting on the variance of each bucket, so that colors
|
|
|
+with the least variance in price will appear before those that have more variance.
|
|
|
+
|
|
|
+==== Sorting based on "deep" metrics
|
|
|
+
|
|
|
+In the prior examples, the metric was a direct child of the bucket. An average
|
|
|
+price was calculated for each term. It is possible to sort on "deeper" metrics,
|
|
|
+which are grandchildren or great-grandchildren of the bucket...with some limitations.
|
|
|
+
|
|
|
+You can define a path to a deeper, nested metric using angle brackets (`>`), like
|
|
|
+so: `my_bucket>another_bucket>metric`
|
|
|
+
|
|
|
+The caveat is that each nested bucket in the path must be a "single value" bucket.
|
|
|
+A `filter` bucket produces a single bucket: all documents which match the
|
|
|
+filtering criteria. Multi-valued buckets (such as `terms`) generate many
|
|
|
+dynamic buckets, which makes it impossible to specify a deterministic path.
|
|
|
+
|
|
|
+Currently there are only two single-value buckets: `filter` and `global`. As
|
|
|
+a quick example, let's build a histogram of car prices, but order the buckets
|
|
|
+by the variance in price of red and green (but not blue) cars in each price range.
|
|
|
+
|
|
|
+[source,js]
|
|
|
+--------------------------------------------------
|
|
|
+GET /cars/transactions/_search?search_type=count
|
|
|
+{
|
|
|
+ "aggs" : {
|
|
|
+ "colors" : {
|
|
|
+ "histogram" : {
|
|
|
+ "field" : "price",
|
|
|
+ "interval": 20000,
|
|
|
+ "order": {
|
|
|
+ "red_green_cars>stats.variance" : "asc" <1>
|
|
|
+ }
|
|
|
+ },
|
|
|
+ "aggs": {
|
|
|
+ "red_green_cars": {
|
|
|
+ "filter": { "terms": {"color": ["red", "green"]}}, <2>
|
|
|
+ "aggs": {
|
|
|
+ "stats": {"extended_stats": {"field" : "price"}} <3>
|
|
|
+ }
|
|
|
+ }
|
|
|
+ }
|
|
|
+ }
|
|
|
+ }
|
|
|
+}
|
|
|
+--------------------------------------------------
|
|
|
+// SENSE: 300_Aggregations/50_sorting_ordering.json
|
|
|
+<1> Sort the buckets generated by the histogram according to the variance of a nested metric
|
|
|
+<2> Because we are using a single-value `filter`, we can use nested sorting
|
|
|
+<3> Sort on the stats generated by this metric
|
|
|
+
|
|
|
+In this example, you can see that we are accessing a nested metric. The `stats`
|
|
|
+metric is a child of `red_green_cars`, which is in turn a child of `colors`. To
|
|
|
+sort on that metric, we define the path as `"red_green_cars>stats.variance"`.
|
|
|
+This is allowed because the `filter` bucket is a single-valued bucket.
|
|
|
+
|
|
|
+
|
|
|
+
|