...
Class | Size | Description |
---|---|---|
Small | < 10 MB | Can be handled easily in memory (on a multi-user system). Effort and cost of implementation is usually low. |
Medium | < 100MB | Can be handled on a single server node, but needs persistence in most cases because it is too big to be processed in memory (on a multi-user system). Effort and cost of implementation is usually low to medium, but depends also on overall data complexity. |
Large | <= Gigabytes | Requires special data management techniques and systems. Must be distributed across systems. Effort and cost of implementation is usually expensive but this depends on overall data complexity. |
Very Large | >= Terabytes | Also known as "Big Data", these datasets encompass volumes of data so large that they require special processing techniques on multiple highly scalable nodes. They usually range from terabytes to petabytes or more. Effort and cost of implementation is usually very expensive and depends highly on overall data complexity. |
Info |
---|
Note that the boundaries between these classifications are sometimes fuzzy and it is not always obvious in the first place, which class really applies. So make sure you investigate enough to be clear before you start implementation. For example, ask the user or the customer upfront about the expected amount of data and define this as a non-functional requirement for implementation. Because the difference in duration and cost of implementation could be exponentially depending on the data size and its complexity. |
...