Confessions of a Software Developer: 2021

Monday, November 15, 2021

OIC: Synchronous versus Fire-and-Forget Process to Process Invocation

In the following article I explain why you should implement a Structured to Structured Process call as synchronous when you need that call to be recoverable in case of an issue.

To call a Structured Process (that is initiated by a Message Start event) from another one, there are 2 ways to do so (actually there are 3 if you count in the Micro Process feature but that is a variation of one of these, I suspect the first):

Using the /ic/api/process/v1/processes API
Using the WSDL of the called process

I typically do the latter, as I find it to be simpler than using the API because importing the WSDL involves an automatic import of the XSD schema containing the request definition. Copy & paste of the WSDL URL and you're done and when the interface changes, all I have to do is re-import the schema (in contrast, when using the API I have to manually figure out the proper JSON sample).

However, whichever way you use both concern a web service based interface, which you can configure to invoke in one of two ways:

As Fire-and-Forget
As synchronous (request/response)

Question is which one to use? The answer is simple: use synchronous in case there is no callback and you want to be sure the invocation either succeeds or can be recovered when it fails. If you don't care, use Fire-and-Forget. Read on to find out why.

First let me point out the Fire-and-Forget Enterprise Integration Pattern. As it will tell you, in case of Fire-and-Forget error handling is not possible and you would need some Guaranteed Delivery mechanism to prevent the risk of losing messages. Mind that web service based invocation is not message based (which would involve using a Message Channel and with that typically an Invalid Message Channel to capture bad messages). So, in case of Fire-and-Forget there will be the risk of running into a non-recoverable error (hence the conclusion). OIC has no "magic feature" supporting recovery from failed invocations to Fire-and-Forget web services. And mind you, this is consistent with the pattern.

Note: in case of (asynchronous) process-to-process invocation with callback you do have a recovery point, which will be on the the callback. When using a Send/Receive pair of activities you can put a boundaryTimer Catch event on the Receive activity that you can model to go into a recovery flow when it does not receive a response in time. This article is about the situation when there is no such callback.

So when can this fail, and what are the consequences? To illustrate I have created a parent Process application that calls one of two child Process applications where one is Fire-and-Forget and the other synchronous. There are 2 different situations that can lead to an issue:

The child process is not available
The child process receives an invalid request it cannot process

To simulate the second I connected the two processes and then modified the child process to have an extra element in its interface that I map at the start event. If not provided, the mapping will result in a selectionFailure (NPE).

The Fire-and-Forget child process looks as follows. I have put a user activity in it as a simple means to make it stop after receiving the message.

The synchronous child process looks as follows. I have put a Message Throw event with name "Response" right after the Start event. The Respond event is configured as a synchronous response to the Start event.

Invalid Request

Now what happens when I let the parent call these child processes with the invalid request? The following shows the instances as you can find them in Workspace -> Processes:

What you see here is that in case of Fire-and-Forget the following happens:

The parent with title "SM FF versus Sync [non-recoverable]" succeeds and get the Completed state
The child with title "SM Child FF" errored

The flow of the parent looks like this:

The flow of the child looks like this:

As you can see the instance of the child rolled back to the start event and is not recoverable. The only option is Abort. As the parent instance is in state Completed, you also cannot recover from there (obviously).

What you see regarding the synchronous invocation, is the following:

The child with title "SM Child Sync" fails and it retries 2 times
After 3 tries the parent gets the Errored state

The parent is now in state "Recoverable" and its flow looks like this:

I can now recover the parent using Alter Flow and add the missing element that the child failed on:

The parent now succeeds:

And a new instance of the child process that initially failed is now in state In Progress (the bottom one initially fails, the top one now succeeds):

Child not Available

The above covers the question what happens in case the child process receives an invalid request. Now what happens when it is not available? There can be 3 situations that would make it so:

The child has been Shut Down
The child has been Retired
The child has been Deactivated (undeployed)

The following shows what happens:

As you can see, in case of Fire-and-Forget the parent only gets in an errored, retriable state when the child is deactivated. In all other cases the parent succeeds. In contrast, in case of a synchronous invocation, the parent fails in all three situations.

Conclusion

So conclusion is that for process to process invocation without callback you must make the invocation synchronous if you want to be able to recover in case of an issue. In case of Fire-and-Forget this is not possible (as the enterprise pattern argues).

Friday, July 23, 2021

How Granular Should My Microprocesses Be?

As with all modularization principles, finding the right granularity is not always trivial and the more important. Some of us have seen complete projects fail because of getting this wrong. The Microprocess Architecture is no exception to this rule and in the following I discuss this topic, hoping to guide you in getting it right.

The Microprocess Architecture provides a solution for handling changes regarding long-running processess. As explained in the article, introducing the Microprocess Architecture the rationale for applying it consists of a combination of reducing impact when implementing new features and bug fixes, ease applying them to an already deployed business process, supporting parallel development and few others. Said differently and in one word: agility.

To correct the mistake made in the introducing article of not defining what a microprocess stands for:

A microprocess is a subprocess of a larger business process, where the subprocess spans the execution of one or more activities to reach a business significant state change of the business process, and which can be developed and deployed as a stand-alone component.

This definition implies that the scope of a microprocess has business visibility. However, as such that does not yet clarify the right granularity. Too coarse-grained and there is a risk of not delivering on the core value of agility, too fine-grained and you risk issues with performance and scalability.

Too coarse grained

Too fine grained

So, what is the right granularity? First let me try to illustrate by example. I then capture some of the main characteristics that should give you guidance on how to apply it for your use case.

Order handling example

An order handling process of a bank, that starts with a customer submitting an order form and ends with invoking one or more back-end systems to handle the delivery, could consist of the following microprocesses:

· Customer checks: execution of several checks to determine if the bank can and should provide the product to the customer (e.g. criminal record check, credit check, etc.). This could involve orchestration of several calls to back-end systems and even services external to the bank and may involve human intervention to deal with the situation that one or more checks fail (alternate scenarios). The state reached by this microprocess is “customer validated”. In the happy scenario where all checks succeed, all is done in a time span of seconds. But when one or more checks fail and some bank employee must decide, it could take hours or even days (especially with a weekend in between).

· Generate quote: determining the price and conditions for the ordered product(s). For a customer order for a combination of a current account, a savings account and a credit card this could involve the orchestration of calls to 3 different back-end systems (to get the price and conditions of each individual product) plus a call to some business rule to check if it is a valid product combination, another business rule to determine if some discount should be applied and finally a call to some service to retrieve the conditions for the order. The “quote generated” state is reached when the price and conditions of the order are presented to the customer, either online (in the same session) or by sending a link to some secure inbox to review it later (when the session is already closed). In the happy scenario all is done in a time span of seconds.

· Sign order: signing of the order by all required signers. This can be anything from signing by one individual for a private account, up to some board of directors of a company in case of a business account. In the latter case the time span might range from minutes up to days or even weeks. The “order signed” state is reached when the order is signed by all signers.

Finalize order: execution of steps to persist all data, determine delivery dates, send an order confirmation to the customer and initiate the back-end system(s) to deliver the products. The “order finalized” state is reached when order delivery has started. In the happy scenario this is done in a time span of seconds.

At a higher level there will be a process in which you can clearly recognize each individual step, be that as activities in a structured process flow (BPMN), or as activities in a case management application, like below:

When drilling down in any of the activities you might find a relatively complex structured process model. For example, the structured process backing the Sign Order activity generates the order agreement, handles multiple signers that may sign over a somewhat longer period of time, includes a loop for sending reminders after some deadline has been reached, and handles order cancellation when it has expired.

All states can be reached by the higher-level process in a timespan of a few seconds to minutes, which qualifies it as near “straight-through processing” (near-STP). But in case of issues like some external system being unavailable, human intervention by an Applications Administrator may be required which might not even happen the same day.

Microprocess characteristics

The following characteristics can help with determining the right granularity for your use case:

All activities in the same microprocess are tightly coupled to achieving a state of the process that is meaningful to the business. Put differently: when you can’t explain its purpose to a businessperson, it’s not a microprocess.
Although the happy scenario might concern near-STP, a microprocess typically involves human intervention for handling alternate or exception scenarios(*). When no human intervention of any kind is applicable, it’s not a microprocess. Therefore, microprocesses are asynchronous by definition.
Processing of the average microprocess has a timespan ranging from seconds to days (in case of human intervention). Weeks are very rare exceptions. Therefore, the chance of a future need to “patch” an in-flight microprocess instance (that is: migrate it from one version to the next) is minimal if not non-existent. When in-flight instance migration is expected to be commonly required, it’s too course-grained so don’t implement it as a microprocess.
With very few exceptions a microprocess can be replaced by a newer version without impact on any of its peers. There can be some impact on the higher-level process, which tends to be restricted to its interface (for example an extra id that needs to be passed on). Vise verse, changes at the higher-level process level do not impact a microprocess. When a change has a high probability of impacting a peer it’s too fine-grained and it implies they should be part of the same microprocess.
A microprocess is an autonomous deployable unit and can be deployed on a different tier than the higher-level process. Moving a microprocess from one tier to another will only have impact on the endpoint used by the higher-level process.

(*) Alternate scenarios in the end reach the same result as the “happy scenario” but in a different way. Exception scenarios are those when things go wrong and someone (typically an Applications or Systems Administrator) must intervene to put the process back on track.

As the goal of business process automation is to reduce human intervention, in the end the result might be a process without any human task (mind this is still a business process). Consequently, a microprocess does not need to have human tasks but human intervention will be applicable to recover in exception scenarios.

Like in the example given, microprocesses orchestrate human intervention with zero or more (synchronous or asynchronous) “services”. The services orchestrated by a microprocess, on their turn can be build using different technologies ranging from synchronous web services (like an OIC Integration) to an asynchronous structured (BPMN) processes of its own. However, the latter does not qualify as a microprocess. It’s just another way of implementing a “service”.

Mind that not every activity in the higher-level process necessarily concerns a business process state change. Most business processes include a few “technical” activities for housekeeping kind-of logic and keep track of technical state changes (like “process initiated”, “service errored”, etc.). A typical example of such activities is some “Initiate Process” activity (the “Receive Order” in the example) which enriches and persists the data from the start request message and instantiates the internal business object the higher-level process works with. Such activities have no meaning to the business (and therefore do not concern microprocesses) but you can’t leave them out of the model either. They are typically tightly coupled to the higher-level process, tend to be part of the same deployable unit and are often implemented as structured processes to provide points of recovery in case of issues.

Friday, April 23, 2021

OIC: Working Around not Having Complex Gateway

In this article I describe how you can work around not having the Complex Gateway in OIC Structured Process. I will end with what I believe to be the best work around with respect to support for refactoring.

The Complex Gateway in BPMN 2.0 is one of the least used features of BPMN. However, when you find a use case for it also may find that there is no alternative way to model it, or not an easy one. The challenge being that in case of parallel flows (be it via Parallel or Inclusive Gateway) the token must move to the merge gateway for each individual flow before it can move further.

One typical use case that I have ran into a couple of times is the one where at a specific point in the process there is more than one way to do something, and either one of them might happen after which the process can move on to the next activity. In case there are only two ways, most of the times you can model this by adding an interrupting Boundary Message or Timer Catch event to the activities.

For example, in the next process there is a Human Task in an order handling process where either 1 of 3 events can happen:

The order is handled
The order is cancelled
The order is expired

Obviously, handling is done via the task. Cancellation can be handled by the Message Boundary Event that exposes a "cancel" operation that - when called - will interrupt the task and move to the end. Expiry can be done via the Timer Boundary Event that - when expired - will interrupt the task and move to the end

More complex do things become when you have two parallel tasks. In the following example an order can be updated by the Customer via the Update Order task, as long as it is not yet handled by the Clerk via the Handle Order task. The Customer can also cancel the order which should withdraw the Handle Order task.

The solution chosen in this model is a Message Boundary Event on each Human Task, one to withdraw the Update Order task and one to withdraw the Handle Order task in case the Customer cancelled the order. It uses an integration that can be called after either one of the tasks is executed, which on its turn will then call the Message Boundary Event of the other task to withdraw it.

A more clever solution is a more generic integration that leverages the OIC tasks API to withdraw any task based on a unique identifier (identificationKey). I could then leave the Message Boundary Events out of the model for withdrawing Human Tasks but that would not work for other types of activities or events. "He, but what about Event Based gateway?", you may think now. Not going to work, because you cannot use that in combination with a Human Task.

As you can see, the second example is much more complex than the first one, even though I left the order expiry use case completely out of the picture. In contrast, the model would look pretty simple when we would have a Complex Gateway. even with the expiry use case included (note: I "photo-shopped" the complex gateway 😉):

You will appreciate that with the work-arounds mentioned before, adding alternate scenarios (like a third human task or an operation to support an update via an integration instead of a task) will soon make any model incomprehensible even for the most experienced BPMN modeler. It just does not scale.

The best approach I could think of is as in the next model, which I have implemented for a real customer case:

The trick is that each individual, parallel flow has its own Embedded Subprocess with a Message Boundary Event. The advantage over the previous workarounds is that within the Embedded Subprocess I'm now free to expand or change the logic by adding or removing activities, events or gateways without breaking the interface of the operations of the overall process. The Cancel Polling and Withdraw Task activities both call one single integration that invokes the boundary event of the other one. With a little bit more effort I can extend it to support more parallel flows

Obviously adding or deleting any of the parallel flows will require the integration to be changed. Still not ideal but at least this scales much better than any of the other workarounds.

Wednesday, March 31, 2021

OIC: Somewhat "Hidden" Feature: Reusable Subprocess

This blog article describes how to configure input and output parameters for Reusable Sub-processes (BPMN Call Activity) in OIC Structured Process

It's more than once that I have see that OIC Process developers are not using the Reusable Subprocess in Structured Process, simply because they are not aware that you can pass on arguments to the Start event and from the End event, like you can for processes that start and end with Message Events.

I will not discuss the benefits of and when to use a Reusable Subprocess, I will save that for a blog posting soon to come.

The thing is, the parameters are somewhat hidden. You create a Reusable Sub-process by creating a process with a None Start and None End event, as in the picture below. To add parameters to both events you can do it like this:

Click on the Start Event.
From the hamburger menu choose Open Properties. This will show you the propertie of the Start Event, where (probably to your surprise) you cannot find the input parameters.
With the Properties tab of the Start Event still shown, click anywhere on the process canvas as long as it is not another component (event, activity, flow, gateway). Ta-da!!

You can now use the Reusable sub-process in any other process using the Call Activity:

As you can see you can pass on argument to it on the Input tab of the mapper, as well as pass on arguments from it on the Output tab.

That easy ;-)

Wednesday, March 24, 2021

OIC Process Correlation: Take Good Care of Your Properties

The other day we had an issue with correlating process instances which turned out to be caused by some "mistake" we made. Took quite some time to figure it out so I thought I share it with you, hoping it can safe you some time.

First I will explain what correlation is about (you may want to check out a much more elaborate blog article on how to use it in OIC-Process by Martien van den Akker, or another one from Anthony Reynolds explaining the concept in the context of BPEL). Correlation is in OIC-Process not really different from what it is in Oracle BPM Suite so if you already know that or when you have done it before in OIC you can skip the next paragraph.

When one process instance is calling another one and of the latter there may be multiple instances, you need a way to make sure the second process calls back the right instance of the first process. That is done by making that the instance to call can uniquely be identified, or "correlated" as it is called. In many cases correlation is out-of-the-box, like for synchronous calls or asynchrounous calls using WS-Addressing. When there is no out-of-the-box correlation, you need to configure it explicitly using what is called "message-based correlation". That means that instances are correlated using a key (value or combination of values) which is (part of) the message that is send from one instance to the other. In OIC that key is called (not surprisingly) the "correlation key" (same as "correlation set" in BPEL). The correlation key has one or more "properties" for which the (combination of) values must be unique in such a way that at any time there cannot be two or more instance flows of the calling process using the same correlation key value(s).

The issue we ran into is that we had to call the same child Structured Process from a parent Structured Process in parallel and that we made a "mistake" with defining the correlation key. The mistake being that we defined 2 correlations keys for 2 parallel flows, calling the same Structured Process but using different properties. When using 2 correlation keys sharing the same property it worked.

The actual process model is similar to the following:

Both parallel flows call the same process. In the example the child process takes a string input argument and performs a callback using that same value. The child is started in the Send activity. To make that the proper flow is being called back in the Receive activity, correlation must happen in the callback. The way to do so is to set up 2 different correlation keys, 1 for each flow and then initiate a unique correlation in the Send activity and in the Receive activity correlate on that unique value. In the example the way to do so is like the following:

This shows how two correlation keys, "ck1" and "ck2" are defined, both having the same "property1". The next picture shows how correlation is initiated in the Send activity:

The correlation happens in the Receive activity as show in the next picture:

What could possibly go wrong, right? Well, you could define your correlation keys like this:

The difference being that the correlation keys have a different property. Net effect being that after either one or both the sub-processes finished without any issue, the parent still waits for the callback that never will come:

When setting up correlion with the same process, there should be no reason to have different properties in the correlation keys, so other than being an inconvenience when you accidentally do it wrong, it should not be a practical problem. I do hope though that with some next release this issues goes away either by that having different properties is supported (might be a reason why that is not a feasible solution) or that you are blocked from configuring it wrong.

By the way, another easy to make mistake is to use "Initialize" instead of "Correlate" in the activity that should correlate. Happens to the best.

Confessions of a Software Developer