Browse Source

Add sorting/ordering

Zachary Tong 11 years ago
parent
commit
8f50da5ec8
2 changed files with 182 additions and 1 deletions
  1. 179 0
      300_Aggregations/50_sorting_ordering.asciidoc
  2. 3 1
      303_Making_Graphs.asciidoc

+ 179 - 0
300_Aggregations/50_sorting_ordering.asciidoc

@@ -0,0 +1,179 @@
+
+=== Sorting multi-value buckets
+
+Multi-value buckets -- like the `terms`, `histogram` and `date_histogram` -- 
+dynamically produce many buckets.  How does Elasticsearch decide what order
+these buckets are presented to the user?
+
+By default, buckets are ordered by `doc_count` in descending order.  This is a
+good default because often we want to find the documents that maximize some
+criteria: price, population, frequency.
+
+But sometimes you'll want to modify this sort order, and there are a few ways to
+do it depending on the bucket.
+
+==== Intrinsic sorts
+
+These sort modes are "intrinsic" to the bucket...they operate on data that bucket
+generates such as `doc_count`.  They share the same syntax but differ slightly
+depending on the bucket being used.
+
+Let's perform a `terms` aggregation but sort by `doc_count` ascending:
+
+[source,js]
+--------------------------------------------------
+GET /cars/transactions/_search?search_type=count
+{
+    "aggs" : {
+        "colors" : {
+            "terms" : {
+              "field" : "color",
+              "order": {
+                "_count" : "asc" <1>
+              }
+            }
+        }
+    }
+}
+--------------------------------------------------
+// SENSE: 300_Aggregations/50_sorting_ordering.json
+<1> Using the `_count` keyword, we can sort by `doc_count` ascending
+
+We introduce a "order" object into the aggregation, which allows us to sort on
+one of several values:
+
+- `_count`: Sort by document count.  Works with `terms`, `histogram`, `date_histogram`
+- `_term`: Sort by the string value of a term alphabetically.  Works only with `terms`
+- `_key`: Sort by the numeric value of each bucket's key (conceptually similar to `_term`).
+Works only with `histogram` and `date_histogram`
+
+==== Sorting by a metric
+
+Often, you'll find yourself wanting to sort based on a metric's calculated value.
+For our car sales analytics dashboard, we may want to build a bar chart of
+sales by car color, but order the bars by the average price ascending.
+
+We can do this by adding a metric to our bucket, then referencing that
+metric from the "order" parameter:
+
+[source,js]
+--------------------------------------------------
+GET /cars/transactions/_search?search_type=count
+{
+    "aggs" : {
+        "colors" : {
+            "terms" : {
+              "field" : "color",
+              "order": {
+                "avg_price" : "asc" <2>
+              }
+            },
+            "aggs": {
+                "avg_price": {
+                    "avg": {"field": "price"} <1>
+                }
+            }
+        }
+    }
+}
+--------------------------------------------------
+// SENSE: 300_Aggregations/50_sorting_ordering.json
+<1> The average price is calculated for each bucket
+<2> Then the buckets are ordered by the calculated average in ascending order
+
+This lets you over-ride the sort order with any metric, simply by referencing
+the name of the metric.  Some metrics, however, emit multiple values.  The
+`extended_stats` metric is a good example: it provides half a dozen individual 
+metrics.
+
+[INFO]
+.Applicable buckets
+====
+Metric-based sorting works with `terms`, `histogram` and `date_histogram`
+====
+
+If you want to sort on a multi-value metric, you just need to use the fully-qualified
+dot path:
+
+[source,js]
+--------------------------------------------------
+GET /cars/transactions/_search?search_type=count
+{
+    "aggs" : {
+        "colors" : {
+            "terms" : {
+              "field" : "color",
+              "order": {
+                "stats.variance" : "asc" <1>
+              }
+            },
+            "aggs": {
+                "stats": {
+                    "extended_stats": {"field": "price"}
+                }
+            }
+        }
+    }
+}
+--------------------------------------------------
+// SENSE: 300_Aggregations/50_sorting_ordering.json
+<1> Using dot notation, we can sort on the metric we are interested in
+
+In this example we are sorting on the variance of each bucket, so that colors
+with the least variance in price will appear before those that have more variance.
+
+==== Sorting based on "deep" metrics
+
+In the prior examples, the metric was a direct child of the bucket.  An average
+price was calculated for each term.  It is possible to sort on "deeper" metrics,
+which are grandchildren or great-grandchildren of the bucket...with some limitations.
+
+You can define a path to a deeper, nested metric using angle brackets (`>`), like
+so: `my_bucket>another_bucket>metric`
+
+The caveat is that each nested bucket in the path must be a "single value" bucket.
+A `filter` bucket produces a single bucket:  all documents which match the
+filtering criteria.  Multi-valued buckets (such as `terms`) generate many
+dynamic buckets, which makes it impossible to specify a deterministic path.
+
+Currently there are only two single-value buckets: `filter` and `global`.  As 
+a quick example, let's build a histogram of car prices, but order the buckets
+by the variance in price of red and green (but not blue) cars in each price range.
+
+[source,js]
+--------------------------------------------------
+GET /cars/transactions/_search?search_type=count
+{
+    "aggs" : {
+        "colors" : {
+            "histogram" : {
+              "field" : "price",
+              "interval": 20000,
+              "order": {
+                "red_green_cars>stats.variance" : "asc" <1>
+              }
+            },
+            "aggs": {
+                "red_green_cars": { 
+                    "filter": { "terms": {"color": ["red", "green"]}}, <2>
+                    "aggs": {
+                        "stats": {"extended_stats": {"field" : "price"}} <3>
+                    }
+                }
+            }
+        }
+    }
+}
+--------------------------------------------------
+// SENSE: 300_Aggregations/50_sorting_ordering.json
+<1> Sort the buckets generated by the histogram according to the variance of a nested metric
+<2> Because we are using a single-value `filter`, we can use nested sorting
+<3> Sort on the stats generated by this metric
+
+In this example, you can see that we are accessing a nested metric.  The `stats`
+metric is a child of `red_green_cars`, which is in turn a child of `colors`.  To
+sort on that metric, we define the path as `"red_green_cars>stats.variance"`.
+This is allowed because the `filter` bucket is a single-valued bucket.
+
+
+

+ 3 - 1
303_Making_Graphs.asciidoc

@@ -6,4 +6,6 @@ include::300_Aggregations/35_date_histogram.asciidoc[]
 
 include::300_Aggregations/40_scope.asciidoc[]
 
-include::300_Aggregations/45_filtering.asciidoc[]
+include::300_Aggregations/45_filtering.asciidoc[]
+
+include::300_Aggregations/50_sorting_ordering.asciidoc[]