Data Pipeline

SINCE VERSION 1.0

What is a Data Pipeline?

A Data Pipeline or in short just Pipeline in PIPEFORCE is a YAML script which describes the flow of data from one endpoint to another. Between these endpoints, the data can be enriched, transformed, cleansed and so on, so it becomes compatible between these integration endpoints.

Such a pipeline consists typically of one or more so called Commands to reach these goals. A command is a server-side task.

Here is an example of such a pipeline YAML script which simply downloads a JSON document and stores it in the attached cloud storage:

pipeline: - http.get: https://somedomain.tld/rest/contracts - drive.save: contracts.json

In the web portal of the enterprise version of PIPEFORCE there is also a no code editor so you can design such pipelines via drag & drop:

image-20240111-142755.png

What is a Command?

A Command in PIPEFORCE is a server-side operation, which covers a single operation. It can be executed remotely via HTTP. It takes an optional input body, optional parameters, processes a certain task, and finally produces an optional output which is the response to the caller. Here is an example of a command URL which can be called via HTTP GET request:

https://hub-NAMESPACE.pipeforce.net/api/v3/command:datetime

A request to this URL would call the datetime command and returns the current date and time from server.

Multiple Commands can optionally be connected like "LEGO" bricks. They will be called one after another at server-side, whereas the output of one command becomes the input of the next command. This is then called a Pipeline of Commands.

Commands in PIPEFORCE implement the Command Bus Pattern which is gaining popularity especially in cloud native enterprise architectures.

There are many different commands for different operations in PIPEFORCE available. For example, commands can

  • Upload and download files

  • Encrypt and decrypt data

  • Save data into a database or read from it

  • Transform and map data

  • Connect to other systems, and read / write data

  • And much more...

You can find a all built-in commands in the commands reference.

Command Name

Each command has a unique name which is always written in lower case and follows the dot.notation. Here are some examples of valid command names in dot notation:

  • barcode.create

  • data.list.iterate

  • log

  • mail.send

  • property.put

As you can see, the command name is usually structured like this:

<group1>.<groupN>.<verb>

Instead, these are for example invalid command names:

  • data_convert

  • dataEntity

  • my command

The full list of available commands and their parameters can be found in the commands reference.

COMMAND NAMES VS. REST RESOURCE NAMES

Even if many command names do have a similar resource-based semantic like HTTP GET, POST or PUT in REST do, they do not follow this approach 100%, since a Command is typically bound to a server side operation, not only to a resource operation. Therefore, the operation type of a command is defined by its name, not by a method header. For example: property.put or config.get to name just a few.

Command Alias

Besides the default command name, a command can also have one or more aliases. These are alternative names which can be used the same way as the default names. For example the command mail.send has also the alias mail. Both can be used the same way for sending emails since they point to the same command implementation.

For list of the alias names of a command, see the command documentation.

Command Parameters

Commands can have zero to many parameters. Whereas each parameter is a name-value pair. The parameters can be passed in different ways to the command, depending on the execution context you're working in. See Executing a Command below.

Here is an example to set parameters to a command as part of a single HTTP call:

An here you can see the same command with parameters, embedded inside a Pipeline script:

Command parameters can be primitive types (integer, decimal, boolean, string) or complex types like YAML or JSON. Here is an example to set a JSON as parameter:

Default Command Parameter (short form)

Since Version 9.0

Some commands need only one mandatory parameter to work. This is called the default parameter.

For example lets see the log command:

The command log requires the mandatory parameter message. This parameter is also marked as the default parameter in the command docs using the (default) tag. Therefore, also the short form can be used:

As you can see, in the short form you can omit writing the parameter name since it is obvious.

It depends on the command whether it supports a default parameter or not. See in the command docs to find out. The default parameters are marked with (default).

Limitations

Default parameter value can only be primitive

The values of a default parameter can only be primitives like string, number or boolean. It cannot be a JSON object or array. If you need to pass a JSON, pass it as string if the command supports this. In most cases the command will auto-convert from string to JSON if required. For example:

The default parameter value can only be primitive like string, number or boolean but no object or array.

Mixing default and ordinary parameters not allowed

Note that mixing the default parameter with ordinary parameters is not possible since YAML specification is not allowing this:

If you use the default parameter of an command, you cannot set any other ordinary parameter on it.

Executing a Command

A single Command can be executed by sending it as HTTP GET or POST request using the endpoint /api/v3/command. The full url structure of this endpoint is always like this:

HTTP GET

Here is an example to execute the log command as HTTP GET request, and set its message parameter to a string value using a HTTP request parameter:

See HTTP Execution Reference for an summary of all supported HTTP options here.

HTTP POST

Here is an example to execute a single command as HTTP POST request, and set the message parameter to a log command using a HTTP POST data body in curl:

See HTTP Execution Reference for an summary of all supported HTTP options here.

CLI

You can also use the PIPEFORCE CLI in order to execute a single Command. Here is an example to call the log command and set the message parameter accordingly:

Body

Also see: .

Beside parameters, a command can also consume and produce a body, similar to a HTTP POST request and response.

Differently to parameters, the input body is typically a more complex document and/or bigger data stream which must be modified in some way. Therefore, it is passed-in and written-out via the body by default.

Here is an example to pass JSON data via body to a cache.put command using a HTTP POST request and the curl tool:

As you can see, the command parameter key has been set as request parameter here and in the HTTP POST body a JSON string is set using the -d switch of curl. This JSON will become the input body (data) for the cache.put command. In order to automatically parse this JSON string into a JSON instance and use it as such in the command, you can optionally specify the Content-Type: application/json header.

NOTE

  • Authentication is done here by using the Authorization header. See Authorization for details.

  • Replace http://hub-trial.pipeforce.org by the url of your target system.

See HTTP Execution Reference for an summary of all supported HTTP options here.

Pipeline in Detail

Two or more Commands can be chained to a flow, called a Pipeline. If such a pipeline gets executed, each command in it will be executed one after another, whereas the output message of the first command will become the input message of the next command, and so on:

By default, such a pipeline is written in a script using the YAML format, which you can then manage by your preferred source code management tool like Git.

Here is an example, which connects two simple commands: The datetime command produces the current date and time and the log command finally logs it:

In the YAML, a pipeline definition starts with the pipeline: element, followed by a list of commands. Each command to be executed is defined as an list element by its name using an indent and a dash -.

By default, the message output (= body) of the first command (datetime in this example) will automatically become the input message (= body) of the next command (log in this example). Therefore no declaration and exchange of variables is required here. The exchange of body data between commands is implicitly.

Parameters

In case you need to configure a command by specifying parameters for it in a pipeline, you can do so by writing them below the command as name-value-pairs with an additional tab indent or a at least two spaces as indent (see YAML specification for this):

You can write parameter values without any quotes. Optionally, you can use single quotes for parameter values:

or double quotes:

Multi-line parameters

Parameters can also be multiple lines long:

In case you need to pass multi-line parameter values, with the line breaks to be preserved, you can do so by using the pipe | character:

There are much more options on how to format line breaks in YAML. For full details, see the YAML specification: https://yaml.org/spec/1.2.2/

JSON Parameters

Furthermore, it is possible to specify a JSON as parameter, like this example shows:

As you can see, there is no need to escape or convert the JSON to a string. It can be placed as JSON 1:1 inside the YAML.

PEL Parameters

Beside static string values it is also possible to pass dynamic values to the parameters. This is done by using a Pipeline Expression. For example:

Also interpolation inside strings is possible out-of-the-box:

For more information about Pipeline Expressions, see section:

Executing a Pipeline

Execute by HTTP request

Executing a Pipeline with HTTP is simple:

  1. Create the Pipeline YAML script

  2. Upload this YAML script in a HTTP POST request to the server

The server then executes your Pipeline and returns with the final output message.

As an example let's assume a more sophisticated pipeline like this, with different parameter value formats:

This example loads a PDF file, sets a text on it and stores it back. In order to execute this Pipeline YAML, you can send it to the server using curl for example:

When using cURL, the EOF syntax is required here in order to preserve the intends of the YAML script. Optionally, you can write the YAML in a local file an pass this file as argument for the body. See the example below, where the pipeline is specified in a file with name mypipeline.yaml:

See HTTP Execution Reference for an summary of all supported HTTP options.

Sending the Body Message with HTTP

You can also send the pipeline body in an HTTP request. See this example:

Note that the value of the body can be a primitive or a JSON without additional escaping required.

Here is an example how to send this using curl:

The log output will be:

Execute in Portal

The portal offers an advanced online editor with syntax highlighting, code completion and debugging support, where you can write pipelines and execute them online. This is the easiest and most preferred way to ad-hoc execute and test your pipelines. Here you can see a simple pipeline after its ad-hoc execution in the online editor:

Execute in CLI

Another approach to execute a pipeline is by using the CLI: Command Line Interface (CLI).

Execute local pipeline file

Lets assume you have a local pipeline YAML stored at src/global/app/myapp/pipeline/test.pi.yaml inside of your PIPEFORCE workspace, then you can execute it via this CLI call (the path must start with src/):

This will load the local pipeline YAML and run it by sending it to the server for execution. The result will be printed out to your terminal if there is any.

Execute persisted remote pipeline

In case you have stored your pipeline at server side in the Property Store, then you can execute it using this call (the path must start with global/):

This command searches for a property in the property store with path global/app/myapp/pipeline/test and executes it. Finally, it sends any result back to your terminal.

Pipeline Sections

Every pipeline YAML script may consist of four main sections:

  • headers

  • vars

  • pipeline

  • body

Here is an example of a pipeline script which defines all of these sections:

All sections except pipeline are optional in a pipeline script. Even if not explicitly defined in the pipeline script, each scope exists implicitly. That means, you can access it and read / set values from / on it without declaring it in the pipeline. For example, by using a pipeline expression (PE).

headers

The headers section is optional. A header is a name-value pair to define "global configuration" hints and configurations for the given pipeline. Only text is allowed as content i.e. no complex objects like JSON. It is not meant to be changed during pipeline processing, even this is possible for rare cases.

Whether and which headers are required depends on the pipeline and its commands. There are some default headers in order to configure the pipeline processing. See the headers reference for details.

It is similar to HTTP Request Headers.

You can read and set values in the headers section using the Pipeline Expression Language (PEL).

vars

The vars section is optional and contains transient variables as name value pairs. It is meant as a transient scope for states during the pipeline processing.

Values can also be complex objects and documents like JSON for example.

Values can be changed during pipeline processing.

You can access values in the vars scope using the Pipeline Expression Language (PEL).

pipeline

The pipeline section is mandatory and lists all commands which must be executed in given order.

See the commands reference for details about the default commands.

You can set dynamic parameter values on commands using the Pipeline Expression Language (PEL).

body

The body section is optional. It defines a single object to be used as “data pool” or transformation data during the pipeline processing.

In case a command returns a value, by default, it will write this value to the body implicitly. Whereas a previous command's value in the body will be overwritten by the command which comes next.

It's also possible to define an initial value for the body in the pipeline. If no such initial value is set, the body is initally null.

You can access values in the body scope using the Pipeline Expression Language (PEL).

Pipeline as JSON

Sometimes it is necessary to use JSON as the pipeline definition language instead of YAML. Let's assume a pipeline written in YAML like this:

You can rewrite this YAML pipeline as JSON pipeline like this:

In order to execute such a JSON pipeline, you can send it to the same POST endpoints as you would do with YAML pipelines, but with header changed to Content-Type: application/json.

Pipeline as URI

Beside YAML and JSON, a third option to define a pipeline is possible: Using a pipeline uri which is an inline version of a pipeline. This is handy in case you must define a pipeline as a “one-liner”.

You can rewrite any pipeline YAML as a single URI request query string.

Let's assume this pipeline YAML:

You can rewrite this pipeline YAML as inline using a URI request query string, which looks like this:

You can then execute such a pipeline URI using the CLI:

So the format is like this:

The request parameter names become the command names and the parameters to this command become the request parameter values. Each single command parameter name ends with a colon :. Multiple command parameters are separated by a semicolon ;.

This is compliant to the URI syntax and especially handy for smaller pipelines which you want to execute ad-hoc from your terminal or where you can define a pipeline as one-liner only. So it consists multiple instructions in one line.

Auto-completion support

In order to enable auto-completion support for your pipeline YAML scripts in your local development editor, you need an editor which supports YAML schema validation. Then, you can have auto-completion which shows all available commands and their parameters:

Auto-completion in IntelliJ

To enable auto-completion in IntelliJ, open preferences and navigate to JSON Schema Mappings:

Preferences → Languages & Frameworks → Schemas & DTDs → JSON Schema Mappings

Add a new schema mapping with these values:

  • Name: pipeline-schema

  • Schema URL: https://hub-<NS>.pipeforce.net/api/v3/command/pipeline.schema.get

  • Schema version: JSON Schema version 7

Add new file path patterns for : *.pi.yaml

Now, try it out: Create a new file foo.pi.yaml and start typing pipeline:.

The key combination [Ctrl] + [Space] should give you a list of suggested values for your YAML.

A YAML pipeline script should always end in suffix .pi.yaml which stands for stands for pipeline scripts written in YAML.

Auto-completion in Visual Studio Code

If you want to also enable code-completion for your pipeline yaml files in your VS Code editor, you need to install the YAML language support plugin from Red Hat first: https://marketplace.visualstudio.com/items?itemName=redhat.vscode-yaml

Then open Preferences → Settings and add this line to your configuration settings.json and save it:

Now, try it out: Create a new file foo.pi.yaml and start typing pipeline:.

The key combination [Ctrl] + [Space] should give you a list of suggested values for your YAML.

A local YAML pipeline script should always end in suffix .pi.yaml which stands for pipeline scripts written in YAML.

Auto-completion in the Portal

The built-in online workbench and the playground in the PIPEFORCE portal supports pipeline script completion out-of-the box.

To start completion simply press [Ctrl] + [Space].

Beside completion for available commands and their parameters, it also supports completion for other parts like utilities and variables for example: