Data Pipelines and Commands
SINCE VERSION 1.0
- 1 What is a Data Pipeline?
- 2 What is a Command?
- 2.1 Command Name
- 2.1.1 Command Alias
- 2.2 Command Parameters
- 2.3 Executing a Command
- 2.3.1 HTTP GET
- 2.3.2 HTTP POST
- 2.3.3 CLI
- 2.4 Body
- 2.1 Command Name
- 3 Pipeline in Detail
- 3.1 Parameters
- 3.1.1 Multi-line parameters
- 3.1.2 JSON Parameters
- 3.1.3 PEL Parameters
- 3.2 Executing a Pipeline
- 3.2.1 Execute by HTTP request
- 3.2.2 Execute in Portal
- 3.2.3 Execute in CLI
- 3.3 Pipeline Sections
- 3.4 Pipeline as JSON
- 3.5 Pipeline as URI
- 3.6 Auto-completion support
- 3.1 Parameters
What is a Data Pipeline?
A Data Pipeline or in short just Pipeline in PIPEFORCE is a YAML script which describes the flow of data from one endpoint to another. Between these endpoints, the data can be enriched, transformed, cleansed and so on, so it becomes compatible between these integration endpoints.
Such a pipeline consists typically of one or more so called Commands to reach these goals. A command is a server-side task.
Here is an example of such a pipeline YAML script which simply downloads a JSON document and stores it in the attached cloud storage:
pipeline:
- http.get: https://somedomain.tld/rest/contracts
- drive.save: contracts.json
In the web portal of the enterprise version of PIPEFORCE there is also a no code editor so you can design such pipelines via drag & drop:
What is a Command?
A Command in PIPEFORCE is a server-side operation, which covers a single operation. It can be executed remotely via HTTP. It takes an optional input body, optional parameters, processes a certain task, and finally produces an optional output which is the response to the caller. Here is an example of a command URL which can be called via HTTP GET request:
https://hub-NAMESPACE.pipeforce.net/api/v3/command:datetime
A request to this URL would call the datetime
command and returns the current date and time from server.
Multiple Commands can optionally be connected like "LEGO" bricks. They will be called one after another at server-side, whereas the output of one command becomes the input of the next command. This is then called a Pipeline of Commands.
Commands in PIPEFORCE implement the Command Bus Pattern which is gaining popularity especially in cloud native enterprise architectures.
There are many different commands for different operations in PIPEFORCE available. For example, commands can
Upload and download files
Encrypt and decrypt data
Save data into a database or read from it
Transform and map data
Connect to other systems, and read / write data
And much more...
You can find a all built-in commands in the commands reference.
Command Name
Each command has a unique name which is always written in lower case and follows the dot.notation
. Here are some examples of valid command names in dot notation:
barcode.create
data.list.iterate
log
mail.send
property.put
As you can see, the command name is usually structured like this:
<group1>.<groupN>.<verb>
Instead, these are for example invalid command names:
data_convert
dataEntity
my command
The full list of available commands and their parameters can be found in the commands reference.
COMMAND NAMES VS. REST RESOURCE NAMES
Even if many command names do have a similar resource-based semantic like HTTP GET, POST or PUT in REST do, they do not follow this approach 100%, since a Command is typically bound to a server side operation, not only to a resource operation. Therefore, the operation type of a command is defined by its name, not by a method header. For example: property.put
or config.get
to name just a few.
Command Alias
Besides the default command name, a command can also have one or more aliases. These are alternative names which can be used the same way as the default names. For example the command mail.send
has also the alias mail
. Both can be used the same way for sending emails since they point to the same command implementation.
For list of the alias names of a command, see the command documentation.
Command Parameters
Commands can have zero to many parameters. Whereas each parameter is a name-value pair. The parameters can be passed in different ways to the command, depending on the execution context you're working in. See Executing a Command below.
Here is an example to set parameters to a command as part of a single HTTP call:
An here you can see the same command with parameters, embedded inside a Pipeline script:
Command parameters can be primitive types (integer
, decimal
, boolean
, string
) or complex types like YAML or JSON. Here is an example to set a JSON as parameter:
Default Command Parameter (short form)
Since Version 9.0
Some commands need only one mandatory parameter to work. This is called the default parameter.
For example lets see the log
command:
The command log
requires the mandatory parameter message
. This parameter is also marked as the default parameter in the command docs using the (default)
tag. Therefore, also the short form can be used:
As you can see, in the short form you can omit writing the parameter name since it is obvious.
It depends on the command whether it supports a default parameter or not. See in the command docs to find out. The default parameters are marked with (default)
.
Limitations
Default parameter value can only be primitive
The values of a default parameter can only be primitives like string, number or boolean. It cannot be a JSON object or array. If you need to pass a JSON, pass it as string if the command supports this. In most cases the command will auto-convert from string to JSON if required. For example:
The default parameter value can only be primitive like string, number or boolean but no object or array.
Mixing default and ordinary parameters not allowed
Note that mixing the default parameter with ordinary parameters is not possible since YAML specification is not allowing this:
If you use the default parameter of an command, you cannot set any other ordinary parameter on it.
Executing a Command
A single Command can be executed by sending it as HTTP GET or POST request using the endpoint /api/v3/command
. The full url structure of this endpoint is always like this:
HTTP GET
Here is an example to execute the log
command as HTTP GET request, and set its message
parameter to a string value using a HTTP request parameter:
Also see PIPEFORCE HTTP API.
HTTP POST
Here is an example to execute a single command as HTTP POST request, and set the message
parameter to a log
command using a HTTP POST data body in curl
:
Also see PIPEFORCE HTTP API.
CLI
You can also use the PIPEFORCE CLI in order to execute a single Command. Here is an example to call the log
command and set the message
parameter accordingly:
Body
Also see: Pipeline Body.
Beside parameters, a command can also consume and produce a body, similar to a HTTP POST request and response.
Differently to parameters, the input body is typically a more complex document and/or bigger data stream which must be modified in some way. Therefore, it is passed-in and written-out via the body by default.
Here is an example to pass JSON data via body to a cache.put
command using a HTTP POST request and the curl
tool:
As you can see, the command parameter key
has been set as request parameter here and in the HTTP POST body a JSON string is set using the -d
switch of curl
. This JSON will become the input body (data) for the cache.put
command. In order to automatically parse this JSON string into a JSON instance and use it as such in the command, you can optionally specify the Content-Type: application/json
header.
NOTE
Authentication is done here by using the
Authorization
header. See Authorization for details.Replace
http://hub-trial.pipeforce.org
by the url of your target system.
See HTTP Execution Reference for an summary of all supported HTTP options here.
Pipeline in Detail
Two or more Commands can be chained to a flow, called a Pipeline. If such a pipeline gets executed, each command in it will be executed one after another, whereas the output message of the first command will become the input message of the next command, and so on:
By default, such a pipeline is written in a script using the YAML format, which you can then manage by your preferred source code management tool like Git.
Here is an example, which connects two simple commands: The datetime
command produces the current date and time and the log
command finally logs it:
In the YAML, a pipeline definition starts with the pipeline:
element, followed by a list of commands. Each command to be executed is defined as an list element by its name using an indent and a dash -
.
By default, the message output (= body) of the first command (datetime
in this example) will automatically become the input message (= body) of the next command (log
in this example). Therefore no declaration and exchange of variables is required here. The exchange of body data between commands is implicitly.
Parameters
In case you need to configure a command by specifying parameters for it in a pipeline, you can do so by writing them below the command as name-value-pairs with an additional tab indent or a at least two spaces as indent (see YAML specification for this):
You can write parameter values without any quotes. Optionally, you can use single quotes for parameter values:
or double quotes:
Multi-line parameters
Parameters can also be multiple lines long:
In case you need to pass multi-line parameter values, with the line breaks to be preserved, you can do so by using the pipe |
character:
There are much more options on how to format line breaks in YAML. For full details, see the YAML specification:
JSON Parameters
Furthermore, it is possible to specify a JSON as parameter, like this example shows:
As you can see, there is no need to escape or convert the JSON to a string. It can be placed as JSON 1:1 inside the YAML.
PEL Parameters
Beside static string values it is also possible to pass dynamic values to the parameters. This is done by using a Pipeline Expression. For example:
Also interpolation inside strings is possible out-of-the-box:
For more information about Pipeline Expressions, see section:
Executing a Pipeline
Execute by HTTP request
Executing a Pipeline with HTTP is simple:
Create the Pipeline YAML script
Upload this YAML script in a HTTP POST request to the server
The server then executes your Pipeline and returns with the final output message.
As an example let's assume a more sophisticated pipeline like this, with different parameter value formats:
This example loads a PDF file, sets a text on it and stores it back. In order to execute this Pipeline YAML, you can send it to the server using curl
for example:
When using cURL, the EOF
syntax is required here in order to preserve the intends of the YAML script. Optionally, you can write the YAML in a local file an pass this file as argument for the body. See the example below, where the pipeline is specified in a file with name mypipeline.yaml
:
See HTTP Execution Reference for an summary of all supported HTTP options.
Sending the Body Message with HTTP
You can also send the pipeline body in an HTTP request. See this example:
Note that the value of the body can be a primitive or a JSON without additional escaping required.
Here is an example how to send this using curl
:
The log output will be:
Execute in Portal
The portal offers an advanced online editor with syntax highlighting, code completion and debugging support, where you can write pipelines and execute them online. This is the easiest and most preferred way to ad-hoc execute and test your pipelines. Here you can see a simple pipeline after its ad-hoc execution in the online editor:
Execute in CLI
Another approach to execute a pipeline is by using the CLI: Command Line Interface (CLI).
Execute local pipeline file
Lets assume you have a local pipeline YAML stored at src/global/app/myapp/pipeline/test.pi.yaml
inside of your PIPEFORCE workspace, then you can execute it via this CLI call (the path must start with src/
):
This will load the local pipeline YAML and run it by sending it to the server for execution. The result will be printed out to your terminal if there is any.
Execute persisted remote pipeline
In case you have stored your pipeline at server side in the Property Store, then you can execute it using this call (the path must start with global/
):
This command searches for a property in the property store with path global/app/myapp/pipeline/test
and executes it. Finally, it sends any result back to your terminal.
Pipeline Sections
Every pipeline YAML script may consist of four main sections:
headers
vars
pipeline
body
Here is an example of a pipeline script which defines all of these sections:
All sections except pipeline
are optional in a pipeline script. Even if not explicitly defined in the pipeline script, each scope exists implicitly. That means, you can access it and read / set values from / on it without declaring it in the pipeline. For example, by using a pipeline expression (PE).
headers
The headers section is optional. A header is a name-value pair to define "global configuration" hints and configurations for the given pipeline. Only text is allowed as content i.e. no complex objects like JSON. It is not meant to be changed during pipeline processing, even this is possible for rare cases.
Whether and which headers are required depends on the pipeline and its commands. There are some default headers in order to configure the pipeline processing. See the headers reference for details.
It is similar to HTTP Request Headers.
You can read and set values in the headers section using the Pipeline Expression Language (PEL).
vars
The vars section is optional and contains transient variables as name value pairs. It is meant as a transient scope for states during the pipeline processing.
Values can also be complex objects and documents like JSON for example.
Values can be changed during pipeline processing.
You can access values in the vars scope using the Pipeline Expression Language (PEL).
pipeline
The pipeline section is mandatory and lists all commands which must be executed in given order.
See the commands reference for details about the default commands.
You can set dynamic parameter values on commands using the Pipeline Expression Language (PEL).
body
The body section is optional. It defines a single object to be used as “data pool” or transformation data during the pipeline processing.
In case a command returns a value, by default, it will write this value to the body implicitly. Whereas a previous command's value in the body will be overwritten by the command which comes next.
It's also possible to define an initial value for the body in the pipeline. If no such initial value is set, the body is initally null
.
You can access values in the body scope using the Pipeline Expression Language (PEL).
Pipeline as JSON
Sometimes it is necessary to use JSON as the pipeline definition language instead of YAML. Let's assume a pipeline written in YAML like this:
You can rewrite this YAML pipeline as JSON pipeline like this:
In order to execute such a JSON pipeline, you can send it to the same POST endpoints as you would do with YAML pipelines, but with header changed to Content-Type: application/json
.
Pipeline as URI
Beside YAML and JSON, a third option to define a pipeline is possible: Using a pipeline uri which is an inline version of a pipeline. This is handy in case you must define a pipeline as a “one-liner”.
You can rewrite any pipeline YAML as a single URI request query string.
Let's assume this pipeline YAML:
You can rewrite this pipeline YAML as inline using a URI request query string, which looks like this:
You can then execute such a pipeline URI using the CLI:
So the format is like this:
The request parameter names become the command names and the parameters to this command become the request parameter values. Each single command parameter name ends with a colon :
. Multiple command parameters are separated by a semicolon ;
.
This is compliant to the URI syntax and especially handy for smaller pipelines which you want to execute ad-hoc from your terminal or where you can define a pipeline as one-liner only. So it consists multiple instructions in one line.
Auto-completion support
In order to enable auto-completion support for your pipeline YAML scripts in your local development editor, you need an editor which supports YAML schema validation. Then, you can have auto-completion which shows all available commands and their parameters:
Auto-completion in IntelliJ
To enable auto-completion in IntelliJ, open preferences and navigate to JSON Schema Mappings:
Preferences → Languages & Frameworks → Schemas & DTDs → JSON Schema Mappings
Add a new schema mapping with these values:
Name:
pipeline-schema
Schema URL:
https://hub-<NS>.pipeforce.net/api/v3/command:pipeline.schema.get
Schema version:
JSON Schema version 7
Add new file path patterns for : *.pi.yaml
Now, try it out: Create a new file foo.pi.yaml
and start typing pipeline:
.
The key combination [Ctrl]
+ [Space]
should give you a list of suggested values for your YAML.
A YAML pipeline script should always end in suffix .pi.yaml which stands for stands for pipeline scripts written in YAML.
Auto-completion in Visual Studio Code
If you want to also enable code-completion for your pipeline yaml files in your VS Code editor, you need to install the YAML language support plugin from Red Hat first: https://marketplace.visualstudio.com/items?itemName=redhat.vscode-yaml
Then open Preferences → Settings
and add this line to your configuration settings.json and save it:
Now, try it out: Create a new file foo.pi.yaml
and start typing pipeline:
.
The key combination [Ctrl]
+ [Space]
should give you a list of suggested values for your YAML.
A local YAML pipeline script should always end in suffix .pi.yaml which stands for pipeline scripts written in YAML.
Auto-completion in the Portal
The built-in online workbench and the playground in the PIPEFORCE portal supports pipeline script completion out-of-the box.
To start completion simply press [Ctrl]
+ [Space]
.
Beside completion for available commands and their parameters, it also supports completion for other parts like utilities and variables for example: