linjinyu
/
elasticsearch-definitive-guide
spiegel van https://github.com/elasticsearch-cn/elasticsearch-definitive-guide.git


			
				
					
						
						
							1234567891011121314151617181920212223242526272829
							[[scale]]
== Designing for scale

Elasticsearch is used by some companies to index and search petabytes of data
every day, but most of us start out with something a little more humble in
size. Even if we aspire to be the next Facebook, it is unlikely that our bank
balance matches our aspirations.  We need to build for what we have today, but
in a way that will allow us to scale out flexibly and rapidly.

Elasticsearch is built to scale.  It will run very happily on your laptop or
in a cluster containing hundreds of nodes, and the experience is almost
identical. Growing from a small cluster to a large cluster is almost entirely
automatic and painless. Growing from a large cluster to a very large cluster
requires a bit more planning and design, but it is still relatively painless.

Of course, it is not magic.  Elasticsearch has its limitations too.  If you
are aware of those limitations and work with them, the growing process will be
pleasant.  If you treat Elasticsearch badly, you could be in for a world of
pain.

The default settings in Elasticsearch will take you a long way but, to get the
most bang for your buck, you need to think about how data flows through your
system.  We will talk about two common data flows: <<time-based>> like log
events or social network streams where relevance is driven by recency, and
<<user-based>> where a large document corpus can be subdivided by user or
customer.

This chapter will help you to make the right decisions up front, to avoid
nasty surprises later on.