How much capacity do I need?
First of all, an index is comprised of many documents. If your application has a clickable search results page, each item in the list probably represents a document in your index. From there, estimating your document count can be done by working backwards from your applications search interactions and extrapolating from your primary database.
Total data sizing can vary depending on your use case. Indexing simple title and category metadata has substantially different data requirements than indexing the entire contents of a multi-megabyte PDF document. Similarly, the sizing of the raw source data does not necessarily translate one-to-one with the final index size. The structure of the underlying index can compress some kinds of data, and expand others.
Ultimately the best way to project your data requirements would be to index a representative subset of your data into a development or staging index. We’ll measure and report the total usage, which you can use to establish a rough average per document and project accordingly.
For example, if the data size for 10,000 documents is 100 MB, then that would figure an average of 10 KB per document. An index of 2,000,000 such documents can reasonably be projected in the 20 GB range.
How much traffic should I plan for?
A good place to start here would be something proportional to user activity for your application. How many active users do you expect at your peak? How many requests are they making to the site in a session, and how many of those are going to fire off search requests?
If you’re switching from another search technology, then you may be able to easily extrapolate from existing activity. (Don’t forget to allow for a bump in usage as the performance goes up!)
If your app is new — or search is a new function in your app — then we think it can’t hurt to be optimistic. Better to have the capacity and not need it, than to need it and not have it. Plus with our elastic usage-based pricing, we make it easy to change to a more appropriate plan after the initial launch produces some better numbers.
A small note on “requests” — as with documents and use cases and agile software development methodologies — not all are the same! Our numbers figure a request duration of about 10ms, a figure that’s comfortably above our observed median request time in real-world production traffic, and not far from the 99th percentile of 20ms. (And that’s just the back-end search engine time, too, we don’t penalize you for bytes in transit across the tubes.)
That said, some kinds of requests deserve a little more time and attention. If you have complex queries which need 100ms to run, our systems would treat that as roughly 10 requests for the purposes of our plan usage metering.