Friday, February 18, 2022

OIC: How-to Do Error Handling When Integrating With Fusion ERP

In the background an Integration with a REST interface may be calling one with a SOAP interface and vice verse. Mapping errors from SOAP to REST or the other way around is not trivial. In this blog article I explain how you could handle this with when using OIC Integration in combination with Oracle Fusion Cloud ERP, at least for most of the cases (no guarantee I cover all 😉 ).

The way your Integration returns a fault is determined by a combination of the type of the adapter (SaaS, SOAP or REST) and how you catch an error coming from the back-end service. As explained in the blog post Fault Handling in OIC you have the most control over how the Integration propagates a fault from a back-end service when invoking it within a Scope and catching it using a Fault Handler with a Fault Return. In this article I assume you follow that pattern.

OIC determines the format of the Fault Return. In case of a SOAP fault this will be a SOAP 1.1 Fault for which the values of the elements are predetermined, except for the <detail>. When using Fault Return the fields you defined in the <fault> of the WSDL end up as structured elements in the <detail> of the fault of the integration. As you can see in this example, I used a format that practically wraps a "SOAP fault" in the <detail> of the actual SOAP fault so that I can give my consumer the best information.



In case of a REST fault all the elements of the Fault Return are fixed but its values are not. In the following example I return a 406 Not Acceptable with a message indicating that the username is missing in the header:



When integrating with Fusion ERP Cloud you typically use one or more of the following options (in order of decreasing preference):

  1. Using the Oracle ERP Cloud Adapter
    This is the preferred way because it provides you 1 single Connection for which you only have to configure the ERP Cloud Host and credentials. When using it lets you search for or browse through all business objects it supports or events you can subscribe to. When invoking a service of the adapter, the fault is returned as a "ServiceException". When using one of the REST resources it returns an APIInvocationError.
  2. Using REST services
    Some REST services provide features not available through the Cloud Adapter and more and more REST services are added over time (but also being added to the Cloud Adapter). In practice you may find a few use cases for them, so brace for ending up with a hybrid situation. It's more work to use a REST service than the Cloud Adapter. You still only configure 1 Connection with the base URL and credentials but when using it, for each individual object you must figure out the URI (the last part of the URL that specifies the resource(s) and optional template parameter) and then you have to copy & paste the sample request and response payloads. On the bright side: the fault returned by a REST services is the same APIInvocationError like any other REST service and, as I explain below, mapping an APIInvocationError is much easier with the REST services than with the Cloud Adapter services.
  3. Using BIP Report Service
    For a few reasons this is a last resort, only to be used when the previous 2 are not an option. The modeled Faults returned by the BIP Report Service are AccessDeniedException, OperationFailedException, InvalidParametersException.

For reasons of completeness: ERP Cloud also provides SOAP services but except for the option of being able to call them using a tool like SoapUI (for trial & error to determine a request that actually works) they do not offer any added value over the adapter as in the background that uses the same services. In this article this option is therefore not considered.

When you don't catch and handle a fault from a back-end service, in case of a SOAP Integration your Integration will return a fault where both the <faultstring> may contain the text "ICS runtime execution error" or CDATA with the useful fault information in the "blob" of CDATA in the <detail>.


In case of an REST Integration an HTTP status code 500 is returned (no matter what back-end error) with, if you are lucky, some information about the actual cause in the errorDetails.title.


All this is not very helpful for the consumer, especially not when the Integrations are used in a UI or by people with little to no knowledge of ERP. In that case you may want to return a fault populated with useful information about the back-end fault. Now how to do that?

ServiceException

The ServiceError is a SOAP fault that wraps a <ServiceErrorMessage> element in its <detail> which can be used to map from. There is a pretty straight-forward mapping to a modeled SOAP fault, although choices may have to be made.

It has a top level Code, Message and Severity element plus a Detail element with a Code, Message and Severity of its own (and a Detail with ... etc. but forget about that). The Code is not always filled out and often (always?) the Code and Message at top-level are pretty generic and not very useful for the consumer. Therefore, you probably want to use the <detail><message> instead. But in practice perhaps it depends. To be honest I don't know. For the limited use cases I dealt with, I ended up using the <detail><message>. Make sure you figure it out for the type of faults you want to catch.

What I ended up doing is this:

ServiceException -> SOAP fault

<faultCode>

Use a nested choose-when-others to map the <ServiceErrorMessage><detail><code> if there is, otherwise the <ServiceErrorMessage><code> otherwise the <ServiceErrorMessage><exceptionClassName>

<faultString>

The <ServiceErrorMessage><message>

<faultActor>

"Client" when the issue is at the client-side, "Server" otherwise

<detail>

<ServiceErrorMessage><detail><message>

Makes it look like this:


I could have made it a bit smarter by checking if the string "JBO-" is in <ServiceErrorMessage><message> and then use the substring-before the ":".

ServiceException -> REST fault

type

when <ServiceErrorMessage><code> is "env:Client" then "https://www.w3.org./Protocols/rfc2616/rfc2616-sec10.html#sec10.5.1", otherwise "https://www.w3.org./Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1"

title

when <ServiceErrorMessage><code> is "env:Client" then the title that best fits the actual cause if you are able to get that from the <ServiceErrorMessage><code> or - when you are lazy - "Bad Request", otherwise "Internal Server Error"

detail

"Client" when APIInvocationError.errorDetails.errorCode starts with "4", "Server" otherwise

errorCode

when <ServiceErrorMessage><code> is "env:Client" then the code that best fits the actual cause or 400 when you are lazy, otherwise 500

 Makes it looks like this:


APIInvocationError

 As mentioned above a REST error is a straightforward APIInvocationError, with top-level elements "type", "title", "detail", "errorCode" and "errorDetails". On it's turn "errorDetails" has subelements "type", "instance", "title", "errorPath" and "errorCode" and these are what gives you the information of the actual back-end error.

What I ended up doing is this:

APIInvocationError -> SOAP fault

<faultCode>

APIInvocationError.errorDetails.errorCode

<faultString>

APIInvocationError.errorDetails.title

<faultActor>

"Client" when APIInvocationError.errorDetails.errorCode starts with "4", "Server" otherwise

<detail>

APIInvocationError.errorDetails.instance

Makes it look like this:


APIInvocationError -> REST Fault

As I explained in >Fault Handling in OIC< (https://kettenisblogs.blogspot.com/2020/08/fault-handling-in-oic.html) this is concerns a very straightforward mapping:

type

APIInvocationError.errorDetails.type

title

APIInvocationError.errorDetails.title

detail

APIInvocationError.errorDetails.instance

errorCode

APIInvocationError.errorDetails.errorCode

Make it look like this:


BIP Report Service

It is very uncommon to run in any of the modeled faults once you created and tested an Integration using the BIP Report Service. I remember seeing it only once in many months, and forgot what caused it. Runtime errors happen often enough: user password expired, BIP report not properly deployed, response size bigger than 1MB, etc.
 
In my experience you can best capture the errors coming from the BIP Report Service using the Default Fault Handler only. This gives you a <fault> with subelements <errorCode>, <reason> and <details>
 

BIP Report Service error -> SOAP Fault

 This is a pretty straightforward mapping. Not the most clear, but the as good as it gets:
 

<faultCode>

when <fault><details> contains "CASDK" then the substring-before the ":" of the <fault><details>, otherwise the <fault><errorCode>

<faultString>

when the <fault><details> contains "Fault Reason" then the substring after "Fault Reason : " and before "]]", otherwise the <fault><reason>

<faultActor>

“Server” when <fault><details> contains “env:Sender”, “Server” otherwise

<detail>

<fault><details>

  

Makes it look like this:


BIP Report Service error -> REST Fault

Considering how rare these kind of errors happen and if so what causes it, I would not put any effort in finding the proper 4xx error in case of a client-side error:
 

type

when <fault><details> contains “env:Sender” then "https://www.w3.org./Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1", otherwise "https://www.w3.org./Protocols/rfc2616/rfc2616-sec10.html#sec10.5.1"

title

when <fault><details> contains “env:Sender” then "Bad Request", otherwise "Internal Server Error"

detail

<fault><details>

errorCode

when <fault><details> contains “env:Sender” then "400", otherwise “500"

  

Makes it look like this:


Thursday, February 03, 2022

OIC: Parallel Gateway and Multi-Threading, the Work-Around

Since using BPMN I have ran twice in the situation where I noticed people confusing the BPMN semantics of Parallel and Inclusive Gateway with the runtime behavior of the engine. In this article I will explain the difference, and how to implement a kind-of 'multi-threading' in Oracle Integration (OIC) Structured Process.

In BPMN 2.0 the Parallel Gateway is a modeling concept and it does not say how it should be implemented by the BPM engine vendor. It might surprise you that not only (current) Oracle Integration but also BPM process engines like Activiti and Camunda do not execute parallel activities exactly at the same time (in parallel threads). Instead they typically wait for each activity to reach a wait state (like a User or Receive activity) before the next one executes and there are some good reasons for that.

First let me explain what Parallel Gateway (and Inclusive Gateway for that matter) in BPMN 2.0 means. From the OCEB Certification Guide, Second Edition, paragraph 6.6.2 (OCEB 2™ is a professional certification program developed by the Object Management Group):

If the sequence flow is split by a parallel gateway, each outgoing sequence flow receives a token. Conditions in the sequence flows are not permitted. The parallel gateway waits for all tokens for synchronization. In this case, the number of tokens corresponds to the number of incoming sequence flows. It is not specified whether the activities A, B, and C shown in the example of Figure 6.39 are executed at the same time. It is ensured, however, that the flow at the parallel gateway is not continued until all three activities have been completed and the tokens arrived.

As you can see, as far as Parallel Gateway is concerned BPMN 2.0 does not imply multi-threading. As a matter of fact BPMN 2.0 does not specify how vendors should implement their engine at all, other than it should comply to the BPMN 2.0 semantics. 

Now I'm not from Oracle Product Development but when I would stand in their shoes, my reason for not supporting multi-threading (at least not unlimited) in a Cloud-native offering like Oracle Integration would be that you don't have any control over the amount of threads the customer's applications might instantiate. Obviously that implies there can be surges in memory usage and CPU that might compromise overall performance or even stability of the environment and with that of the SLA you want to be able to maintain.

Besides performance related arguments there are also some logical issues. All flows potentially could update the same entity at the merge and arrive there at the same time, which implies that some locking pattern is needed to prevent deadlocks at the merge or (alternatively) the tool should let the developer decide and configure how the merge of data changes should take place. That on its turn comes with performance or complexity challenges of its own.

Just to be clear: I'm not claiming there are no BPM engines that support multi-threaded Parallel Gateways, only not the ones I know. By now I hope you understand why they might not and if they do there probably are consequences as resources simply are not unlimited.

And still, you may have a use case where you need some sort of 'multi-threaded' parallel execution. For example, a Structured Process is started from a UI and the user expects a user task to be scheduled within a few seconds. Or there are 2 user tasks in a row, both assigned to the same user who expects a seamless transition from one to the other ("sticky user" that was called in Oracle BPM 10g). In between multiple services need to be called that are not all that quick. When Parallel Gateway and - by the way - also Integration do not support out-of-the-box multi-threading, how to achieve that anyway?

(Spoiler alert!) The short answer is by calling each synchronous service from its own asynchronous Service Handler, where each Service Handler does a synchronous service call and have the individual flows of the Parallel Gateway calling these Service Handlers.

To elaborate on it, let me start with explaining that Service Handler is a pattern I use for calling a Service (Integration) from a Structured Process of its own. I will explain it better some other time, but for now it is good enough knowing that the Service Handler does nothing else then calling one service and handle any fault it may raise. By doing so you prevent that the technical complexity that might be involved in exception handling is exposed in the main flow, and as a bonus a Service Handler with complex exception handling can easily be copied and reconfigured to call another service, so it is also a development productivity booster. I rest my case.

Normally a Service Handler is implemented as a Reusable Subprocess but to achieve some sort of parallelism we will use a Structured Process with a Message Start and Message End event, which I call "Process as-a Service" and that is called from the flows in the Parallel Gateway with a Send/Receive activity combination (as in the picture above).

The beauty of it is that it is a piece of cake to transform a Service Handler from a Reusable Subprocess into a Process as-a Service and likewise, changing the call to it is also very easy.

Reusable Process versus "Process as-a Service"

Now the trick of achieve parallelism this way, is that a Receive activity implies a wait state which means the Receive itself happens in a new transaction. The process engine will first execute the Send activity, right after that schedule the corresponding Receive activity and in the same thread go to the next flow of the Parallel Gateway until all of them are in their Receive activity. And then it is ready to start receiving responses.

So all Send activities are still done sequentially but in the meantime for each one of those its Service Handler can start calling the synchronous service. And because that happens in a process instance of its own they are done in parallel. Once a synchronous service call is done, the Service Handler does the callback to the Structured Process with the Parallel Gateway. As simple as that!

You now may wonder if and how this solution would compromise performance and stability of OIC, so what is the caveat? Under high load there will be a performance penalty for sure. The engine must instantiate and handle an extra process instance for every asynchronous Service Handler and at some point that will start to impact overall performance. However, it should not impact stability. The reason is that Send and Reciece activities are message-based, meaning the message send to and by the Service Handler are put on a queue from which the engine can pick it up as soon as it can find the time. That is how resource exhaustion is prevented.

Big question now is: when would this work-around start to be interesting to you?

I have done some performance testing and found that pushing logic to a Reusable Subprocess adds some 40ms compared to executing the same logic in the main thread. For me this is small enough for not needing to think about whether I should use a Service Handler or not, not even when performance is key. So I always do that. However, when comparing a Process as-a Service to a Reusable Subprocess, the first one adds more than 100ms of overhead. So when there are not that many parallel flows and when the services are quick enough, this overhead will not justify doing them asynchronously. However, there will be some turning point where synchronous handling is going to be outpaced in spite of the overhead.

There are 2 dimensions impacting that turning point. The first being the amount of services to call. In case of synchronous calls the Parallel Gateway will never be faster that the sum of the processing times of the individual services. In case of asynchronous calls it will never be faster than the time needed to initiate all Send and Receive activities plus the overhead for using Send/Receive plus the time used by the slowest service. Which points to the second dimension, being the processing time of the service calls. The more parallel flows involved or the longer the slowest service takes, the more attractive it becomes to do them asynchronously.

I have created a test application with a setup where I have 4 parallel flows in my gateway. I use an Inclusive instead of Parallel Gateway. Performance-wise that does not make a difference but provides me the option to execute 1, 2, 3 or 4 flows at the same time. All flows call the same, synchronous Integration which on its turn calls an external service that waits a configurable amount of time before returning a response. In this way I can vary on both dimensions as I wish.


As you can imagine I could have spend days trying out all kinds of combinations (they are countless). Like you I also have many other things to do, so I limited myself to try out and blog about 2 test cases that I ran in a quite hour while making sure my wife was not watching Netflix or something. Just to give you some impression where such a turning point might be. These test cases are:

  1. Initiating 4 parallel flows, each Integration call with the same delay,
  2. Initiating 4 parallel flows, 3 Integration with a small and 1 with a larger delay (the "slowest service").

And then I played with that until I found the point where synchronous and asynchronous were about as fast.

What I found during test 1 is that with 4 parallel flows and the Integration having an average response time of 150ms, that both performed about the same. When the Integrations were faster (e.g. 140ms), synchronous execution was quicker and when they were slower (e.g. 160ms) asynchronous was quicker. 

What I found during test 2 is that with the 3 quicker ones having an average response time around 125ms, from 290ms and higher for the slow one, the asynchronous option started to outpace the synchronous one.

Like I said, there are countless combinations I could have tried regarding the amount of parallel flows and the response times of those and there also other aspects to consider, like the performance under stress of both the process and the services. So in practice you will have to load test your solution to find out what performs best in your case. Just remember, when using Service Handlers it is very easy to switch from one option to the other. What a great pattern is that!

Monday, November 15, 2021

OIC: Synchronous versus Fire-and-Forget Process to Process Invocation

In the following article I explain why you should implement a Structured to Structured Process call as synchronous when you need that call to be recoverable in case of an issue.

To call a Structured Process (that is initiated by a Message Start event) from another one, there are 2 ways to do so (actually there are 3 if you count in the Micro Process feature but that is a variation of one of these, I suspect the first):

  • Using the /ic/api/process/v1/processes API
  • Using the WSDL of the called process

I typically do the latter, as I find it to be simpler than using the API because importing the WSDL involves an automatic import of the XSD schema containing the request definition. Copy & paste of the WSDL URL and you're done and when the interface changes, all I have to do is re-import the schema (in contrast, when using the API I have to manually figure out the proper JSON sample).

However, whichever way you use both concern a web service based interface, which you can configure to invoke in one of two ways:

  • As Fire-and-Forget
  • As synchronous (request/response)

Question is which one to use? The answer is simple: use synchronous in case there is no callback and you want to be sure the invocation either succeeds or can be recovered when it fails. If you don't care, use Fire-and-Forget. Read on to find out why.

First let me point out the Fire-and-Forget Enterprise Integration Pattern. As it will tell you, in case of Fire-and-Forget error handling is not possible and you would need some Guaranteed Delivery mechanism to prevent the risk of losing messages. Mind that web service based invocation is not message based (which would involve using a Message Channel and with that typically an Invalid Message Channel to capture bad messages). So, in case of Fire-and-Forget there will be the risk of running into a non-recoverable error (hence the conclusion). OIC has no "magic feature" supporting recovery from failed invocations to Fire-and-Forget web services. And mind you, this is consistent with the pattern.

Note: in case of (asynchronous) process-to-process invocation with callback you do have a recovery point, which will be on the the callback. When using a Send/Receive pair of activities you can put a boundaryTimer Catch event on the Receive activity that you can model to go into a recovery flow when it does not receive a response in time. This article is about the situation when there is no such callback.

So when can this fail, and what are the consequences? To illustrate I have created a parent Process application that calls one of two child Process applications where one is Fire-and-Forget and the other synchronous. There are 2 different situations that can lead to an issue:

  1. The child process is not available
  2. The child process receives an invalid request it cannot process

To simulate the second I connected the two processes and then modified the child process to have an extra element in its interface that I map at the start event. If not provided, the mapping will result in a selectionFailure (NPE).

The Fire-and-Forget child process looks as follows. I have put a user activity in it as a simple means to make it stop after receiving the message.


The synchronous child process looks as follows. I have put a Message Throw event with name "Response" right after the Start event. The Respond event is configured as a synchronous response to the Start event.

 

Invalid Request

Now what happens when I let the parent call these child processes with the invalid request? The following shows the instances as you can find them in Workspace -> Processes:


What you see here is that in case of Fire-and-Forget the following happens:

  • The parent with title "SM FF versus Sync [non-recoverable]" succeeds and get the Completed state
  • The child with title "SM Child FF" errored

The flow of the parent looks like this:


The flow of the child looks like this:


As you can see the instance of the child rolled back to the start event and is not recoverable. The only option is Abort. As the parent instance is in state Completed, you also cannot recover from there (obviously).

What you see regarding the synchronous invocation, is the following:

  • The child with title "SM Child Sync" fails and it retries 2 times
  • After 3 tries the parent gets the Errored state

The parent is now in state "Recoverable" and its flow looks like this:


I can now recover the parent using Alter Flow and add the missing element that the child failed on:


The parent now succeeds:


And a new instance of the child process that initially failed is now in state In Progress (the bottom one initially fails, the top one now succeeds):


Child not Available

The above covers the question what happens in case the child process receives an invalid request. Now what happens when it is not available? There can be 3 situations that would make it so:

  1. The child has been Shut Down
  2. The child has been Retired
  3. The child has been Deactivated (undeployed)

The following shows what happens:


As you can see, in case of Fire-and-Forget the parent only gets in an errored, retriable state when the child is deactivated. In all other cases the parent succeeds. In contrast, in case of a synchronous invocation, the parent fails in all three situations.

Conclusion

So conclusion is that for process to process invocation without callback you must make the invocation synchronous if you want to be able to recover in case of an issue. In case of Fire-and-Forget this is not possible (as the enterprise pattern argues).

Friday, July 23, 2021

How Granular Should My Microprocesses Be?

As with all modularization principles, finding the right granularity is not always trivial and the more important. Some of us have seen complete projects fail because of getting this wrong. The Microprocess Architecture is no exception to this rule and in the following I discuss this topic, hoping to guide you in getting it right.

The Microprocess Architecture provides a solution for handling changes regarding long-running processess. As explained in the article, introducing the Microprocess Architecture the rationale for applying it consists of a combination of reducing impact when implementing new features and bug fixes, ease applying them to an already deployed business process, supporting parallel development and few others. Said differently and in one word: agility.

To correct the mistake made in the introducing article of not defining what a microprocess stands for:

A microprocess is a subprocess of a larger business process, where the subprocess spans the execution of one or more activities to reach a business significant state change of the business process, and which can be developed and deployed as a stand-alone component.

This definition implies that the scope of a microprocess has business visibility. However, as such that does not yet clarify the right granularity. Too coarse-grained and there is a risk of not delivering on the core value of agility, too fine-grained and you risk issues with performance and scalability.

 

Too coarse grained
 

 

Too fine grained
 

So, what is the right granularity? First let me try to illustrate by example. I then capture some of the main characteristics that should give you guidance on how to apply it for your use case.

Order handling example

An order handling process of a bank, that starts with a customer submitting an order form and ends with invoking one or more back-end systems to handle the delivery, could consist of the following microprocesses:

·        Customer checks: execution of several checks to determine if the bank can and should provide the product to the customer (e.g. criminal record check, credit check, etc.). This could involve orchestration of several calls to back-end systems and even services external to the bank and may involve human intervention to deal with the situation that one or more checks fail (alternate scenarios). The state reached by this microprocess is “customer validated”. In the happy scenario where all checks succeed, all is done in a time span of seconds. But when one or more checks fail and some bank employee must decide, it could take hours or even days (especially with a weekend in between).

·        Generate quote: determining the price and conditions for the ordered product(s). For a customer order for a combination of a current account, a savings account and a credit card this could involve the orchestration of calls to 3 different back-end systems (to get the price and conditions of each individual product) plus a call to some business rule to check if it is a valid product combination, another business rule to determine if some discount should be applied and finally a call to some service to retrieve the conditions for the order. The “quote generated” state is reached when the price and conditions of the order are presented to the customer, either online (in the same session) or by sending a link to some secure inbox to review it later (when the session is already closed). In the happy scenario all is done in a time span of seconds.

·        Sign order: signing of the order by all required signers. This can be anything from signing by one individual for a private account, up to some board of directors of a company in case of a business account. In the latter case the time span might range from minutes up to days or even weeks. The “order signed” state is reached when the order is signed by all signers.

Finalize order: execution of steps to persist all data, determine delivery dates, send an order confirmation to the customer and initiate the back-end system(s) to deliver the products.  The “order finalized” state is reached when order delivery has started. In the happy scenario this is done in a time span of seconds.

At a higher level there will be a process in which you can clearly recognize each individual step, be that as activities in a structured process flow (BPMN), or as activities in a case management application, like below:

When drilling down in any of the activities you might find a relatively complex structured process model. For example, the structured process backing the Sign Order activity generates the order agreement, handles multiple signers that may sign over a somewhat longer period of time, includes a loop for sending reminders after some deadline has been reached, and handles order cancellation when it has expired.


All states can be reached by the higher-level process in a timespan of a few seconds to minutes, which qualifies it as near “straight-through processing” (near-STP). But in case of issues like some external system being unavailable, human intervention by an Applications Administrator may be required which might not even happen the same day.

Microprocess characteristics

The following characteristics can help with determining the right granularity for your use case:
  • All activities in the same microprocess are tightly coupled to achieving a state of the process that is meaningful to the business. Put differently: when you can’t explain its purpose to a businessperson, it’s not a microprocess.
  • Although the happy scenario might concern near-STP, a microprocess typically involves human intervention for handling alternate or exception scenarios(*). When no human intervention of any kind is applicable, it’s not a microprocess. Therefore, microprocesses are asynchronous by definition.
  • Processing of the average microprocess has a timespan ranging from seconds to days (in case of human intervention). Weeks are very rare exceptions. Therefore, the chance of a future need to “patch” an in-flight microprocess instance (that is: migrate it from one version to the next) is minimal if not non-existent. When in-flight instance migration is expected to be commonly required, it’s too course-grained so don’t implement it as a microprocess.
  • With very few exceptions a microprocess can be replaced by a newer version without impact on any of its peers. There can be some impact on the higher-level process, which tends to be restricted to its interface (for example an extra id that needs to be passed on). Vise verse, changes at the higher-level process level do not impact a microprocess. When a change has a high probability of impacting a peer it’s too fine-grained and it implies they should be part of the same microprocess.
  • A microprocess is an autonomous deployable unit and can be deployed on a different tier than the higher-level process. Moving a microprocess from one tier to another will only have impact on the endpoint used by the higher-level process.

(*) Alternate scenarios in the end reach the same result as the “happy scenario” but in a different way. Exception scenarios are those when things go wrong and someone (typically an Applications or Systems Administrator) must intervene to put the process back on track.

As the goal of business process automation is to reduce human intervention, in the end the result might be a process without any human task (mind this is still a business process). Consequently, a microprocess does not need to have human tasks but human intervention will be applicable to recover in exception scenarios.

Like in the example given, microprocesses orchestrate human intervention with zero or more (synchronous or asynchronous) “services”. The services orchestrated by a microprocess, on their turn can be build using different technologies ranging from synchronous web services (like an OIC Integration) to an asynchronous structured (BPMN) processes of its own. However, the latter does not qualify as a microprocess. It’s just another way of implementing a “service”.

Mind that not every activity in the higher-level process necessarily concerns a business process state change. Most business processes include a few “technical” activities for housekeeping kind-of logic and keep track of technical state changes (like “process initiated”, “service errored”, etc.). A typical example of such activities is some “Initiate Process” activity (the “Receive Order” in the example) which enriches and persists the data from the start request message and instantiates the internal business object the higher-level process works with. Such activities have no meaning to the business (and therefore do not concern microprocesses) but you can’t leave them out of the model either. They are typically tightly coupled to the higher-level process, tend to be part of the same deployable unit and are often implemented as structured processes to provide points of recovery in case of issues.

Friday, April 23, 2021

OIC: Working Around not Having Complex Gateway

In this article I describe how you can work around not having the Complex Gateway in OIC Structured Process. I will end with what I believe to be the best work around with respect to support for refactoring.

The Complex Gateway in BPMN 2.0 is one of the least used features of BPMN. However, when you find a use case for it also may find that there is no alternative way to model it, or not an easy one. The challenge being that in case of parallel flows (be it via Parallel or Inclusive Gateway) the token must move to the merge gateway for each individual flow before it can move further.

One typical use case that I have ran into a couple of times is the one where at a specific point in the process there is more than one way to do something, and either one of them might happen after which the process can move on to the next activity. In case there are only two ways, most of the times you can model this by adding an interrupting Boundary Message or Timer Catch event to the activities.

For example, in the next process there is a Human Task in an order handling process where either 1 of 3 events can happen:

  1. The order is handled
  2. The order is cancelled
  3. The order is expired

Obviously, handling is done via the task. Cancellation can be handled by the Message Boundary Event that exposes a "cancel" operation that - when called - will interrupt the task and move to the end. Expiry can be done via the Timer Boundary Event that - when expired - will interrupt the task and move to the end

More complex do things become when you have two parallel tasks. In the following example an order can be updated by the Customer via the Update Order task, as long as it is not yet handled by the Clerk via the Handle Order task. The Customer can also cancel the order which should withdraw the Handle Order task. 


The solution chosen in this model is a Message Boundary Event on each Human Task, one to withdraw the Update Order task and one to withdraw the Handle Order task in case the Customer cancelled the order. It uses an integration that can be called after either one of the tasks is executed, which on its turn will then call the Message Boundary Event of the other task to withdraw it. 

A more clever solution is a more generic integration that leverages the OIC tasks API to withdraw any task based on a unique identifier (identificationKey). I could then leave the Message Boundary Events out of the model for withdrawing Human Tasks but that would not work for other types of activities or events. "He, but what about Event Based gateway?", you may think now. Not going to work, because you cannot use that in combination with a Human Task.

As you can see, the second example is much more complex than the first one, even though I left the order expiry use case completely out of the picture. In contrast, the model would look pretty simple when we would have a Complex Gateway. even with the expiry use case included (note: I "photo-shopped" the complex gateway 😉):

You will appreciate that with the work-arounds mentioned before, adding alternate scenarios (like a third human task or an operation to support an update via an integration instead of a task) will soon make any model incomprehensible even for the most experienced BPMN modeler. It just does not scale.

The best approach I could think of is as in the next model, which I have implemented for a real customer case:

 

The trick is that each individual, parallel flow has its own Embedded Subprocess with a Message Boundary Event. The advantage over the previous workarounds is that within the Embedded Subprocess I'm now free to expand or change the logic by adding or removing activities, events or gateways without breaking the interface of the operations of the overall process. The Cancel Polling and Withdraw Task activities both call one single integration that invokes the boundary event of the other one. With a little bit more effort I can extend it to support more parallel flows

Obviously adding or deleting any of the parallel flows will require the integration to be changed. Still not ideal but at least this scales much better than any of the other workarounds.