Friday, February 18, 2022

OIC: How-to Do Error Handling When Integrating With Fusion ERP

In the background an Integration with a REST interface may be calling one with a SOAP interface and vice verse. Mapping errors from SOAP to REST or the other way around is not trivial. In this blog article I explain how you could handle this with when using OIC Integration in combination with Oracle Fusion Cloud ERP, at least for most of the cases (no guarantee I cover all 😉 ).

The way your Integration returns a fault is determined by a combination of the type of the adapter (SaaS, SOAP or REST) and how you catch an error coming from the back-end service. As explained in the blog post Fault Handling in OIC you have the most control over how the Integration propagates a fault from a back-end service when invoking it within a Scope and catching it using a Fault Handler with a Fault Return. In this article I assume you follow that pattern.

OIC determines the format of the Fault Return. In case of a SOAP fault this will be a SOAP 1.1 Fault for which the values of the elements are predetermined, except for the <detail>. When using Fault Return the fields you defined in the <fault> of the WSDL end up as structured elements in the <detail> of the fault of the integration. As you can see in this example, I used a format that practically wraps a "SOAP fault" in the <detail> of the actual SOAP fault so that I can give my consumer the best information.



In case of a REST fault all the elements of the Fault Return are fixed but its values are not. In the following example I return a 406 Not Acceptable with a message indicating that the username is missing in the header:



When integrating with Fusion ERP Cloud you typically use one or more of the following options (in order of decreasing preference):

  1. Using the Oracle ERP Cloud Adapter
    This is the preferred way because it provides you 1 single Connection for which you only have to configure the ERP Cloud Host and credentials. When using it lets you search for or browse through all business objects it supports or events you can subscribe to. When invoking a service of the adapter, the fault is returned as a "ServiceException". When using one of the REST resources it returns an APIInvocationError.
  2. Using REST services
    Some REST services provide features not available through the Cloud Adapter and more and more REST services are added over time (but also being added to the Cloud Adapter). In practice you may find a few use cases for them, so brace for ending up with a hybrid situation. It's more work to use a REST service than the Cloud Adapter. You still only configure 1 Connection with the base URL and credentials but when using it, for each individual object you must figure out the URI (the last part of the URL that specifies the resource(s) and optional template parameter) and then you have to copy & paste the sample request and response payloads. On the bright side: the fault returned by a REST services is the same APIInvocationError like any other REST service and, as I explain below, mapping an APIInvocationError is much easier with the REST services than with the Cloud Adapter services.
  3. Using BIP Report Service
    For a few reasons this is a last resort, only to be used when the previous 2 are not an option. The modeled Faults returned by the BIP Report Service are AccessDeniedException, OperationFailedException, InvalidParametersException.

For reasons of completeness: ERP Cloud also provides SOAP services but except for the option of being able to call them using a tool like SoapUI (for trial & error to determine a request that actually works) they do not offer any added value over the adapter as in the background that uses the same services. In this article this option is therefore not considered.

When you don't catch and handle a fault from a back-end service, in case of a SOAP Integration your Integration will return a fault where both the <faultstring> may contain the text "ICS runtime execution error" or CDATA with the useful fault information in the "blob" of CDATA in the <detail>.


In case of an REST Integration an HTTP status code 500 is returned (no matter what back-end error) with, if you are lucky, some information about the actual cause in the errorDetails.title.


All this is not very helpful for the consumer, especially not when the Integrations are used in a UI or by people with little to no knowledge of ERP. In that case you may want to return a fault populated with useful information about the back-end fault. Now how to do that?

ServiceException

The ServiceError is a SOAP fault that wraps a <ServiceErrorMessage> element in its <detail> which can be used to map from. There is a pretty straight-forward mapping to a modeled SOAP fault, although choices may have to be made.

It has a top level Code, Message and Severity element plus a Detail element with a Code, Message and Severity of its own (and a Detail with ... etc. but forget about that). The Code is not always filled out and often (always?) the Code and Message at top-level are pretty generic and not very useful for the consumer. Therefore, you probably want to use the <detail><message> instead. But in practice perhaps it depends. To be honest I don't know. For the limited use cases I dealt with, I ended up using the <detail><message>. Make sure you figure it out for the type of faults you want to catch.

What I ended up doing is this:

ServiceException -> SOAP fault

<faultCode>

Use a nested choose-when-others to map the <ServiceErrorMessage><detail><code> if there is, otherwise the <ServiceErrorMessage><code> otherwise the <ServiceErrorMessage><exceptionClassName>

<faultString>

The <ServiceErrorMessage><message>

<faultActor>

"Client" when the issue is at the client-side, "Server" otherwise

<detail>

<ServiceErrorMessage><detail><message>

Makes it look like this:


I could have made it a bit smarter by checking if the string "JBO-" is in <ServiceErrorMessage><message> and then use the substring-before the ":".

ServiceException -> REST fault

type

when <ServiceErrorMessage><code> is "env:Client" then "https://www.w3.org./Protocols/rfc2616/rfc2616-sec10.html#sec10.5.1", otherwise "https://www.w3.org./Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1"

title

when <ServiceErrorMessage><code> is "env:Client" then the title that best fits the actual cause if you are able to get that from the <ServiceErrorMessage><code> or - when you are lazy - "Bad Request", otherwise "Internal Server Error"

detail

"Client" when APIInvocationError.errorDetails.errorCode starts with "4", "Server" otherwise

errorCode

when <ServiceErrorMessage><code> is "env:Client" then the code that best fits the actual cause or 400 when you are lazy, otherwise 500

 Makes it looks like this:


APIInvocationError

 As mentioned above a REST error is a straightforward APIInvocationError, with top-level elements "type", "title", "detail", "errorCode" and "errorDetails". On it's turn "errorDetails" has subelements "type", "instance", "title", "errorPath" and "errorCode" and these are what gives you the information of the actual back-end error.

What I ended up doing is this:

APIInvocationError -> SOAP fault

<faultCode>

APIInvocationError.errorDetails.errorCode

<faultString>

APIInvocationError.errorDetails.title

<faultActor>

"Client" when APIInvocationError.errorDetails.errorCode starts with "4", "Server" otherwise

<detail>

APIInvocationError.errorDetails.instance

Makes it look like this:


APIInvocationError -> REST Fault

As I explained in >Fault Handling in OIC< (https://kettenisblogs.blogspot.com/2020/08/fault-handling-in-oic.html) this is concerns a very straightforward mapping:

type

APIInvocationError.errorDetails.type

title

APIInvocationError.errorDetails.title

detail

APIInvocationError.errorDetails.instance

errorCode

APIInvocationError.errorDetails.errorCode

Make it look like this:


BIP Report Service

It is very uncommon to run in any of the modeled faults once you created and tested an Integration using the BIP Report Service. I remember seeing it only once in many months, and forgot what caused it. Runtime errors happen often enough: user password expired, BIP report not properly deployed, response size bigger than 1MB, etc.
 
In my experience you can best capture the errors coming from the BIP Report Service using the Default Fault Handler only. This gives you a <fault> with subelements <errorCode>, <reason> and <details>
 

BIP Report Service error -> SOAP Fault

 This is a pretty straightforward mapping. Not the most clear, but the as good as it gets:
 

<faultCode>

when <fault><details> contains "CASDK" then the substring-before the ":" of the <fault><details>, otherwise the <fault><errorCode>

<faultString>

when the <fault><details> contains "Fault Reason" then the substring after "Fault Reason : " and before "]]", otherwise the <fault><reason>

<faultActor>

“Server” when <fault><details> contains “env:Sender”, “Server” otherwise

<detail>

<fault><details>

  

Makes it look like this:


BIP Report Service error -> REST Fault

Considering how rare these kind of errors happen and if so what causes it, I would not put any effort in finding the proper 4xx error in case of a client-side error:
 

type

when <fault><details> contains “env:Sender” then "https://www.w3.org./Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1", otherwise "https://www.w3.org./Protocols/rfc2616/rfc2616-sec10.html#sec10.5.1"

title

when <fault><details> contains “env:Sender” then "Bad Request", otherwise "Internal Server Error"

detail

<fault><details>

errorCode

when <fault><details> contains “env:Sender” then "400", otherwise “500"

  

Makes it look like this:


Thursday, February 03, 2022

OIC: Parallel Gateway and Multi-Threading, the Work-Around

Since using BPMN I have ran twice in the situation where I noticed people confusing the BPMN semantics of Parallel and Inclusive Gateway with the runtime behavior of the engine. In this article I will explain the difference, and how to implement a kind-of 'multi-threading' in Oracle Integration (OIC) Structured Process.

In BPMN 2.0 the Parallel Gateway is a modeling concept and it does not say how it should be implemented by the BPM engine vendor. It might surprise you that not only (current) Oracle Integration but also BPM process engines like Activiti and Camunda do not execute parallel activities exactly at the same time (in parallel threads). Instead they typically wait for each activity to reach a wait state (like a User or Receive activity) before the next one executes and there are some good reasons for that.

First let me explain what Parallel Gateway (and Inclusive Gateway for that matter) in BPMN 2.0 means. From the OCEB Certification Guide, Second Edition, paragraph 6.6.2 (OCEB 2™ is a professional certification program developed by the Object Management Group):

If the sequence flow is split by a parallel gateway, each outgoing sequence flow receives a token. Conditions in the sequence flows are not permitted. The parallel gateway waits for all tokens for synchronization. In this case, the number of tokens corresponds to the number of incoming sequence flows. It is not specified whether the activities A, B, and C shown in the example of Figure 6.39 are executed at the same time. It is ensured, however, that the flow at the parallel gateway is not continued until all three activities have been completed and the tokens arrived.

As you can see, as far as Parallel Gateway is concerned BPMN 2.0 does not imply multi-threading. As a matter of fact BPMN 2.0 does not specify how vendors should implement their engine at all, other than it should comply to the BPMN 2.0 semantics. 

Now I'm not from Oracle Product Development but when I would stand in their shoes, my reason for not supporting multi-threading (at least not unlimited) in a Cloud-native offering like Oracle Integration would be that you don't have any control over the amount of threads the customer's applications might instantiate. Obviously that implies there can be surges in memory usage and CPU that might compromise overall performance or even stability of the environment and with that of the SLA you want to be able to maintain.

Besides performance related arguments there are also some logical issues. All flows potentially could update the same entity at the merge and arrive there at the same time, which implies that some locking pattern is needed to prevent deadlocks at the merge or (alternatively) the tool should let the developer decide and configure how the merge of data changes should take place. That on its turn comes with performance or complexity challenges of its own.

Just to be clear: I'm not claiming there are no BPM engines that support multi-threaded Parallel Gateways, only not the ones I know. By now I hope you understand why they might not and if they do there probably are consequences as resources simply are not unlimited.

And still, you may have a use case where you need some sort of 'multi-threaded' parallel execution. For example, a Structured Process is started from a UI and the user expects a user task to be scheduled within a few seconds. Or there are 2 user tasks in a row, both assigned to the same user who expects a seamless transition from one to the other ("sticky user" that was called in Oracle BPM 10g). In between multiple services need to be called that are not all that quick. When Parallel Gateway and - by the way - also Integration do not support out-of-the-box multi-threading, how to achieve that anyway?

(Spoiler alert!) The short answer is by calling each synchronous service from its own asynchronous Service Handler, where each Service Handler does a synchronous service call and have the individual flows of the Parallel Gateway calling these Service Handlers.

To elaborate on it, let me start with explaining that Service Handler is a pattern I use for calling a Service (Integration) from a Structured Process of its own. I will explain it better some other time, but for now it is good enough knowing that the Service Handler does nothing else then calling one service and handle any fault it may raise. By doing so you prevent that the technical complexity that might be involved in exception handling is exposed in the main flow, and as a bonus a Service Handler with complex exception handling can easily be copied and reconfigured to call another service, so it is also a development productivity booster. I rest my case.

Normally a Service Handler is implemented as a Reusable Subprocess but to achieve some sort of parallelism we will use a Structured Process with a Message Start and Message End event, which I call "Process as-a Service" and that is called from the flows in the Parallel Gateway with a Send/Receive activity combination (as in the picture above).

The beauty of it is that it is a piece of cake to transform a Service Handler from a Reusable Subprocess into a Process as-a Service and likewise, changing the call to it is also very easy.

Reusable Process versus "Process as-a Service"

Now the trick of achieve parallelism this way, is that a Receive activity implies a wait state which means the Receive itself happens in a new transaction. The process engine will first execute the Send activity, right after that schedule the corresponding Receive activity and in the same thread go to the next flow of the Parallel Gateway until all of them are in their Receive activity. And then it is ready to start receiving responses.

So all Send activities are still done sequentially but in the meantime for each one of those its Service Handler can start calling the synchronous service. And because that happens in a process instance of its own they are done in parallel. Once a synchronous service call is done, the Service Handler does the callback to the Structured Process with the Parallel Gateway. As simple as that!

You now may wonder if and how this solution would compromise performance and stability of OIC, so what is the caveat? Under high load there will be a performance penalty for sure. The engine must instantiate and handle an extra process instance for every asynchronous Service Handler and at some point that will start to impact overall performance. However, it should not impact stability. The reason is that Send and Reciece activities are message-based, meaning the message send to and by the Service Handler are put on a queue from which the engine can pick it up as soon as it can find the time. That is how resource exhaustion is prevented.

Big question now is: when would this work-around start to be interesting to you?

I have done some performance testing and found that pushing logic to a Reusable Subprocess adds some 40ms compared to executing the same logic in the main thread. For me this is small enough for not needing to think about whether I should use a Service Handler or not, not even when performance is key. So I always do that. However, when comparing a Process as-a Service to a Reusable Subprocess, the first one adds more than 100ms of overhead. So when there are not that many parallel flows and when the services are quick enough, this overhead will not justify doing them asynchronously. However, there will be some turning point where synchronous handling is going to be outpaced in spite of the overhead.

There are 2 dimensions impacting that turning point. The first being the amount of services to call. In case of synchronous calls the Parallel Gateway will never be faster that the sum of the processing times of the individual services. In case of asynchronous calls it will never be faster than the time needed to initiate all Send and Receive activities plus the overhead for using Send/Receive plus the time used by the slowest service. Which points to the second dimension, being the processing time of the service calls. The more parallel flows involved or the longer the slowest service takes, the more attractive it becomes to do them asynchronously.

I have created a test application with a setup where I have 4 parallel flows in my gateway. I use an Inclusive instead of Parallel Gateway. Performance-wise that does not make a difference but provides me the option to execute 1, 2, 3 or 4 flows at the same time. All flows call the same, synchronous Integration which on its turn calls an external service that waits a configurable amount of time before returning a response. In this way I can vary on both dimensions as I wish.


As you can imagine I could have spend days trying out all kinds of combinations (they are countless). Like you I also have many other things to do, so I limited myself to try out and blog about 2 test cases that I ran in a quite hour while making sure my wife was not watching Netflix or something. Just to give you some impression where such a turning point might be. These test cases are:

  1. Initiating 4 parallel flows, each Integration call with the same delay,
  2. Initiating 4 parallel flows, 3 Integration with a small and 1 with a larger delay (the "slowest service").

And then I played with that until I found the point where synchronous and asynchronous were about as fast.

What I found during test 1 is that with 4 parallel flows and the Integration having an average response time of 150ms, that both performed about the same. When the Integrations were faster (e.g. 140ms), synchronous execution was quicker and when they were slower (e.g. 160ms) asynchronous was quicker. 

What I found during test 2 is that with the 3 quicker ones having an average response time around 125ms, from 290ms and higher for the slow one, the asynchronous option started to outpace the synchronous one.

Like I said, there are countless combinations I could have tried regarding the amount of parallel flows and the response times of those and there also other aspects to consider, like the performance under stress of both the process and the services. So in practice you will have to load test your solution to find out what performs best in your case. Just remember, when using Service Handlers it is very easy to switch from one option to the other. What a great pattern is that!