Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Below, there is an introduction to all data mapping and transformation toolings you can use in PIPEFORCE.

Data Size Classification

Before you select the right data mapping and transformation tool, you should always think about the expected input data first. Depending on its size, some tools could be better suitable than others. Here is a classification on data size which is very often used:

Class

Size

Description

Small

< 10 MB

Can be handled easily in memory (on a multi-user system).

Effort and cost of implementation is usually low.

Medium

< 100MB

Can be handled on a single server node, but needs persistence in most cases because it is too big to be processed in memory (on a multi-user system).

Effort and cost of implementation is usually low to medium, but depends also on overall data complexity.

Large

<= Gigabytes

Requires special data management techniques and systems. Must be distributed across systems.

Effort and cost of implementation is usually expensive but this depends on overall data complexity.

Very Large
(Big Data)

>= Terabytes

Also known as "Big Data", these datasets encompass volumes of data so large that they require special processing techniques on multiple highly scalable nodes. They usually range from terabytes to petabytes or more.

Effort and cost of implementation is usually very expensive and depends highly on overall data complexity.

Info

Note that the boundaries between these classifications are sometimes fuzzy and it is not always obvious in the first place, which class really applies. So make sure you investigate enough to be clear before you start implementation.

For example, ask the user or the customer upfront about the expected amount of data and define this as a non-functional requirement for implementation. Because the difference in duration and cost of implementation could be exponentially depending on the data size and its complexity.

Transformer Commands

A transformer command in PIPEFORCE is a command which transforms / converts data from one structure into another. For example:

...

See the reference documentation for a full list of the available Pipeline Utils.

Querying JSON Data

The One of the best performing way ways of selecting and filtering JSON data is by applying a query directly on the database (property store). Since only the data is returned which matches the given query and the query algorithms are applied directly in the database layer, this should be the preferred way for big medium and very big large sized JSON documents.

For more details on this see: JSON Property Querying.

You can also use one of the data.filter.* commands for this. But they are all working on a JSON document in memory. They are fast and effective for small to medium sized JSONs.

Integration Patterns Overview​

...

For more information how to do this, see: Microservices Framework.

Info

PIPEFORCE TOOLINGS

These are some suggested PIPEFORCE toolings in order to implement this pattern you can select from to fit your specific needs:

...