8 years ago · ca205189d0
--- a/301_Aggregation_Overview.asciidoc
+++ b/301_Aggregation_Overview.asciidoc
@@ -1,18 +1,16 @@
 
				 [[aggs-high-level]]
			
 
				-== High-Level Concepts
			
 
				 
			
 
				-Like the query DSL, ((("aggregations", "high-level concepts")))aggregations have a _composable_ syntax: independent units
			
 
				-of functionality can be mixed and matched to provide the custom behavior that
			
 
				-you need. This means that there are only a few basic concepts to learn, but
			
 
				-nearly limitless combinations of those basic components.
			
 
				+== 高阶概念
			
 
				 
			
 
				-To master aggregations, you need to understand only two main concepts:
			
 
				+类似于 DSL 查询表达式，((("聚合", "高阶概念")))聚合也有 _可组合_ 的语法：独立单元的功能可以被混合起来提供你需要的自定义行为。这意味着只需要学习很少的基本概念，就可以得到几乎无尽的组合。
			
 
				 
			
 
				-_Buckets_:: Collections of documents that meet a criterion
			
 
				-_Metrics_:: Statistics calculated on the documents in a bucket
			
 
				+要掌握聚合，你只需要明白两个主要的概念：
			
 
				 
			
 
				-That's it!  Every aggregation is simply a combination of one or more buckets
			
 
				-and zero or more metrics. To translate into rough SQL terms:
			
 
				+ _桶（Buckets）_ :: 满足特定条件的文档的集合
			
 
				+
			
 
				+ _指标（Metrics）_ :: 对桶内的文档进行统计计算
			
 
				+
			
 
				+这就是全部了！每个聚合都是一个或者多个桶和零个或者多个指标的组合。翻译成粗略的SQL语句来解释吧：
			
 
				 
			
 
				 [source,sql]
			
 
				 --------------------------------------------------
			
@@ -20,68 +18,53 @@ SELECT COUNT(color) <1>
 
				 FROM table
			
 
				 GROUP BY color <2>
			
 
				 --------------------------------------------------
			
 
				-<1> `COUNT(color)` is equivalent to a metric.
			
 
				-<2> `GROUP BY color` is equivalent to a bucket.
			
 
				+<1> `COUNT(color)` 相当于指标。
			
 
				+
			
 
				+<2> `GROUP BY color` 相当于桶。
			
 
				 
			
 
				-Buckets are conceptually similar to grouping in SQL, while metrics are similar
			
 
				-to `COUNT()`, `SUM()`, `MAX()`, and so forth.
			
 
				+桶在概念上类似于 SQL 的分组（GROUP BY），而指标则类似于 `COUNT()` 、 `SUM()` 、 `MAX()` 等统计方法。
			
 
				 
			
 
				 
			
 
				-Let's dig into both of these concepts((("aggregations", "high-level concepts", "buckets")))((("buckets"))) and see what they entail.
			
 
				+让我们深入这两个概念((("aggregations", "high-level concepts", "buckets")))((("buckets"))) 并且了解和这两个概念相关的东西。
			
 
				 
			
 
				 [role="pagebreak-before"]
			
 
				-=== Buckets
			
 
				+[[_buckets]]
			
 
				+=== 桶
			
 
				 
			
 
				-A _bucket_ is simply a collection of documents that meet certain criteria:
			
 
				+_桶_ 简单来说就是满足特定条件的文档的集合：
			
 
				 
			
 
				-- An employee would land in either the _male_ or _female_ bucket.
			
 
				-- The city of Albany would land in the _New York_ state bucket.
			
 
				-- The date 2014-10-28 would land within the _October_ bucket.
			
 
				+- 一个雇员属于 _男性_ 桶或者 _女性_ 桶
			
 
				 
			
 
				-As aggregations are executed, the values inside each document are evaluated to
			
 
				-determine whether they match a bucket's criteria.  If they match, the document is placed
			
 
				-inside the bucket and the aggregation continues.
			
 
				+- 奥尔巴尼属于 _纽约_ 桶
			
 
				 
			
 
				-Buckets can also be nested inside other buckets, giving you a hierarchy or
			
 
				-conditional partitioning scheme.  For example, Cincinnati would be placed inside
			
 
				-the Ohio state bucket, and the _entire_ Ohio bucket would be placed inside the
			
 
				-USA country bucket.
			
 
				+- 日期2014-10-28属于 _十月_ 桶
			
 
				 
			
 
				-Elasticsearch has a variety of buckets, which allow you to
			
 
				-partition documents in many ways (by hour, by most-popular terms, by
			
 
				-age ranges, by geographical location, and more).  But fundamentally they all operate
			
 
				-on the same principle: partitioning documents based on criteria.
			
 
				+当聚合开始被执行，每个文档里面的值通过计算来决定符合哪个桶的条件。如果匹配到，文档将放入相应的桶并接着进行聚合操作。
			
 
				 
			
 
				-=== Metrics
			
 
				+桶也可以被嵌套在其他桶里面，提供层次化的或者有条件的划分方案。例如，辛辛那提会被放入俄亥俄州这个桶，而 _整个_ 俄亥俄州桶会被放入美国这个桶。
			
 
				 
			
 
				-Buckets allow us to partition documents into useful subsets,((("aggregations", "high-level concepts", "metrics")))((("metrics"))) but ultimately what
			
 
				-we want is some kind of metric calculated on those documents in each bucket.
			
 
				-Bucketing is the means to an end: it provides a way to group documents in a way
			
 
				-that you can calculate interesting metrics.
			
 
				+Elasticsearch 有很多种类型的桶，能让你通过很多种方式来划分文档（时间、最受欢迎的词、年龄区间、地理位置等等）。其实根本上都是通过同样的原理进行操作：基于条件来划分文档。
			
 
				 
			
 
				-Most _metrics_ are simple mathematical operations (for example, min, mean, max, and sum)
			
 
				-that are calculated using the document values.  In practical terms, metrics allow
			
 
				-you to calculate quantities such as the average salary, or the maximum sale price,
			
 
				-or the 95th percentile for query latency.
			
 
				+[[_metrics]]
			
 
				+=== 指标
			
 
				 
			
 
				-=== Combining the Two
			
 
				+桶能让我们划分文档到有意义的集合，((("aggregations", "high-level concepts", "metrics")))((("metrics")))但是最终我们需要的是对这些桶内的文档进行一些指标的计算。分桶是一种达到目的的手段：它提供了一种给文档分组的方法来让我们可以计算感兴趣的指标。
			
 
				 
			
 
				-An _aggregation_ is a combination of buckets and metrics.((("aggregations", "high-level concepts", "combining buckets and metrics")))((("buckets", "combining with metrics")))((("metrics", "combining with buckets")))  An aggregation may have
			
 
				-a single bucket, or a single metric, or one of each.  It may even have multiple
			
 
				-buckets nested inside other buckets. For example, we can partition documents by which country they belong to (a bucket), and
			
 
				-then calculate the average salary per country (a metric).
			
 
				+大多数 _指标_ 是简单的数学运算（例如最小值、平均值、最大值，还有汇总），这些是通过文档的值来计算。在实践中，指标能让你计算像平均薪资、最高出售价格、95%的查询延迟这样的数据。
			
 
				 
			
 
				-Because buckets can be nested, we can derive a much more complex aggregation:
			
 
				+[[_combining_the_two]]
			
 
				+=== 桶和指标的组合
			
 
				 
			
 
				-1. Partition documents by country (bucket).
			
 
				-2. Then partition each country bucket by gender (bucket).
			
 
				-3. Then partition each gender bucket by age ranges (bucket).
			
 
				-4. Finally, calculate the average salary for each age range (metric)
			
 
				+_聚合_ 是由桶和指标组成的。((("aggregations", "high-level concepts", "combining buckets and metrics")))((("buckets", "combining with metrics")))((("metrics", "combining with buckets"))) 聚合可能只有一个桶，可能只有一个指标，或者可能两个都有。也有可能有一些桶嵌套在其他桶里面。例如，我们可以通过所属国家来划分文档（桶），然后计算每个国家的平均薪酬（指标）。
			
 
				 
			
 
				-This will give you the average salary per `<country, gender, age>` combination.  All in
			
 
				-one request and with one pass over the data!
			
 
				+由于桶可以被嵌套，我们可以实现非常多并且非常复杂的聚合：
			
 
				 
			
 
				+1.通过国家划分文档（桶）
			
 
				 
			
 
				+2.然后通过性别划分每个国家（桶）
			
 
				 
			
 
				+3.然后通过年龄区间划分每种性别（桶）
			
 
				 
			
 
				+4.最后，为每个年龄区间计算平均薪酬（指标）
			
 
				 
			
 
				+最后将告诉你每个 `<国家, 性别, 年龄>` 组合的平均薪酬。所有的这些都在一个请求内完成并且只遍历一次数据！