Wednesday, March 29, 2017

Oracle BPM: Time for Time Out (2)

In a previous blog posting I discussed a solution to re-initiate a scope in BPMN that is supposed to time out after some time. In this posting I discuss how that solution inspires a couple of other use cases where a time out has to be re-initiated by calling an operation on the process.

In the following process model there are three flow, for three different use cases to re-initiate the time-out of:

  1. A process instance (top flow),
  2. An (asynchronous) Receive activity (middle flow),
  3. A User activity (bottom flow).



Re-initiate Timer for Process Instance

The trick here is to use an Event Based gateway that either fires when the time-out occurs, or responds to the call to the re-initiation operation (Reinitiate Requested in the picture) which passes on a new duration. The Timeout Event Gateway is started again, whereby the the new duration is used to (re)schedule the Time Out timer. The reinitiate Gateway is necessary to loop back, and is the default. The condition of the no flow is "false".

The following picture shows the flow when that happens.


Re-initiate Timer for Receive Activity

The re-initiation of the Receive activity happens through a Boundary Message event. The dummy Gateway does not do anything but is necessary to loop back to. The Receive is then rescheduled with a timer that has a new duration as passed on through the call.

The following picture shows the flow when that happens.

Re-initiate Timer for a User Activity

In the previous two examples, the timer is completely (re)scheduled with the passed-on duration. In the bottom example the time-out of the User activity happens by setting the expiration on the Human Task. This is the recommended way as it will make the expiration visible in Workspace, and make sure the Human Workflow Engine properly cleans up the Human Task (which was not always the case in previous releases of the Oracle BPM Suite).

What happens in this scenario is that the expiry is actually not re-initiated but instead paused for a while using an Update activity with operation "Suspend Timers", then wait, and then continue the timer using an Update activity with operation "Resume Timers". This construction allows usage of an (non-interrupting) Event Subprocess, which has the advantage that it does not clutter the rest of the process model, you keep the same Human Task instance (with the same taskId) plus, if you have multiple Human Tasks at the same time, you can also use this construction to suspend other user activities as well.




The following picture shows the flow when that happens.

If you want to re-initiate the timer in a similar way as in the previous two use cases, then you can use the second solution with a Boundary Timer event and a Boundary Message event. The result will be that the Human Task is actually aborted (as said not in some older 11g versions), and then a new instance is created (with a new taskId!). Depending on your process model you can also put the User activity in a scope of its own, and re-initiate the timer of that as described in the previous posting on this topic.

Friday, March 24, 2017

Oracle Weblogic: Tackling Class Loading Issues for SOA Infra

This blog article discusses how to address class loading issues with the Oracle SOA Infra. It's prime "raison d'etre" being a memory dump of something I don't do often, but may spend significant time in finding out how to do it again.

Some time ago I lost valuable time because some library being deployed twice, once in the wrong place ([SOA_HOME]/lib folder) and once in the right place ([SOA_HOME]/soa/modules/oracle.soa.ext_11.1.1). In this particular case the first was wrong because the library was using classes that were only loaded when the SOA infrastructure was initialized.

I had created a composite that relied upon some code from the jar, which I knew should be there, but every time it was called it gave me a NoSuchMethodError. A nasty problem because deployment of the jar file was not done by me, but instead by some Operations department that I could only contact indirectly, and any request could easily take a day to get resolved. Of course I blamed these stupid people from Operations that did not even know how to deploy a jar file properly, and undoubtedly Operations was blaming this idiot calling himself a developer but did not know how to code straight. Polite as we both are, we did not say so to each other of course. Me giving you this anecdote only to point out one of disadvantages of not doing DevOps ;-)

But then came the WebLogic Classloader Analysis Tool (or CAT for short) to the rescue. With that I was able to determine that my jar was loaded from both the lib folder as well as the oracle.soa_ext_11.1.1 but as the first one has preference over the seconds one, my composite always went to the old lib, even though Operations did deploy the latest version to the proper location, So somewhere early in the process Operations did deploy it in the wrong location (ha!), but then again at the time I probably did not give them proper instructions about its location either (hmm...).

There already is enough information to be found about the Classloader Analysis Tool, including this one, so I just will stick to explaining how I found out to find out what is being loaded from the lib folder of the SOA Server and what from the oracle.soa_ext_11.1.1 folder.

To go to CAT use a URL like this; http://[server]:[port]/wls-cat. Make sure you go to the SOA Server, and not the Admin Server (unless that is one and the same). Any class loaded by the SOA infra you can find from soa-infra -> soa-infra -> View: detailed -> Classloader Tree. The jars from the lib folder are loaded by the java.net.URLClassLoader whereas the SOA infra itself (including the external jars) are loaded by the weblogic.utils.classloaders.GenericClassLoader.




Wednesday, March 22, 2017

Oracle BPM: Time for Time Out (1)

In this posting I describe how to time out a specific BPM scope with the option to re-initiate the timer.

In case you need to model a time out for a specific scope within a process where you want to be able to modify the time out run-time, then you can model it similar to this:

A parallel flow is used where the top flow covers the main process, and the bottom flow handles the timeout. To make the timeout configurable, the bottom flow uses an Event Gateway with a Message event to interrupt the timer and re-initiate it again. The first of the two flows that reaches the Complex Merge aborts the other one (first come, first served), as configured in the Complex Merge:

Note: If you want re-initiation to happen based on a Signal, than you cannot use that in an Event Gateway. However, as a work-around you can define a separate component in the composite that is subscribed to the Signal event, and then calls the "Reinitiation Requested" Message Start event.

Time Out Flow

The timer is configured using an expression that results in a duration:

Furthermore you need some variable that is initiated in the Start operation as false, e.g. called a "mainProcessTimesOut":


"mainProcessTimeOut" is set to true in the "Set Timed Out" Script activity, and used in the "timed out?" Exclusive Gateway to go to the "End" or "Timed Out" End event.

Reinitiate Flow

The "Reinitiation Requested" Message Catch event exposes a "reinitiateTimer" operation that takes the new expiry duration as input, plus an id to correlate the instance:


As the "Reinitiation Requested" Message Catch is only activated in case re-initialization of the timer is requested, the condition of the no-flow from "reinitiate?" can simply be set to false, and the yes-flow as the default.

In a follow-up article I will discuss some more patterns for timing out with re-initiation.

Tuesday, March 21, 2017

Oracle BPM: Hiding Faults from BPM? Don't use Service Activity!

In the following I explain how you can hide faults from BPM by not using (synchronous) Service activities, but (asynchronous) Send/Receive activities instead.

When calling services from a BPM process, you should think about where you want faults to show up and to be handled. This is specifically of interest when you have some integration layer between your BPM processes and external services that you call to abstract the external services from the BPM process. Let's call this layer the Service Layer. I have seen such a layer in various formats, ranging from a Reusable Subprocess, a BPEL process in the same composite as the BPM process, or a BPEL process in a separate composite, or instead of BPEL a Mediator. You may have such a layer to hide technical details from the business process, to cover some sort of custom exception handling, or to hide the message format from these external services from the BPM process (or a combination of all that). The latter might be because you don't have the luxury to do message transformation in a service bus.

In case the BPM process calls the Service Layer through a (synchronous) Service activity and that fails, then this will result in the main BPM instance to get into an errored state, and you will have to handle the error in the BPM process. This behavior might be exactly what you wanted to prevent with the Service Layer, for example because the Service call is in a parallel flow and you want to be sure that the fault does not impact processing of the other, parallel threads.

The following example shows what happens. It concerns a main BPM process, that calls synchronous ServicePS from the Service Layer, which on its turn calls some other ServiceA that (finally) calls a FailingService that always fails. The example is a bit over complicated because I configured a fault policy in the synchronous services. You may be aware that I wrote some other article explaining that this is not a good practice, but when creating this example I did not had that insight yet ;-) So bear with me and just ignore these synchronous services still being in a "Running" state after they failed.

The following shows the synchronous BPEL of the ServicePS.


Because the whole chains of calls is synchronous from beginning to the end, you will see that all synchronous services have the "Faulted" state. Because of the fault policy in the BPM (the only one that makes sense in this case) it is still running, but because the fault bubbled up to the BPM instance that shows the error as well.



Now lets refactor this to a solution where the Service Layer will hide the fault from the BPM process. To do so, all calls from the BPM process to the Service Layer will have to be asynchronous.

The following shows the asynchronous BPEL of ServiceAsyncPS_NP. 

Learning from my earlier mistake with the fault policy, this asynchronous service now is the only one in the chain with a fault policy. Because the FailingService failed, also the (synchronous) ServiceA_NP failed. But because ServicePSAsync_PS is asynchronous, that is where it stopped.


The error can be recovered from there, and in the meantime, the BPM process runs like there is no cloud in the sky.


Because of the asynchronous nature of the ServiceLayer, this is not a decision you should take lightly. For example, statefull BPEL cannot be migrated, so any error in it cannot be fixed for running instances. It therefore might not be the silver bullet you were looking for.

Friday, March 17, 2017

Oracle BPM: Loops and Gateway Struggles

If there is one issue that I see people often struggle with, then it is the use of loops in combination with gateways. The following discusses a few cases.

The following picture shows several loops in combination with a Parallel gateway, of which some are valid and some not. The same holds for the Inclusive gateway.

To understand why some loops are valid and other not, you have to realize that at the beginning of a Parallel or Inclusive gateway as many tokens are generated as there are parallel flows that run between the start and end of the gateway. To the BPM engine this translates to 1 or more threads that are instantiated.

No such restrictions are there for an exclusive gateway, because then there is only one token (thread) active at any time.

So in BPMN the following flows are not valid:
  • From "crossover?", because you are going to another thread that may already have passed the point that the flow goes to. However, JDeveloper does not prevent you from doing so.
  • From "loop back inside to beginning", because at the beginning of the gateway new threads would have to be instantiated for flows of which some threads may already run. JDeveloper should fail validation of such a construct.
  • From "loop back inside from outside", because you would then have to go back to a thread already ended in the merge. JDeveloper should fail validation of such a construct.

The flows that are valid in BPMN are:
  • From "loop back inside", as you loop back within the same thread.
  • From "loop back outside to beginning" as you are re-instantiating a new set of threads for which the previous set already ended.

In case the latter does not work apply patch 23230734.

Wednesday, March 08, 2017

Oracle BPM 12c: Hide Implementation Details with the Refine Feature

Ever had a case with the Oracle BPM Suite where you wanted to create a BPMN model while hiding the details from the reader? Then the "refine" feature may be what you are looking for. Read on if you want to know more about this feature that has been added since 12c. I actually blogged about it before, but this time I want to also illustrate the impact it has on the flow trace.

The "refine" feature is a way to detail an activity.  Basically it is a specialization of the (already in 11g present) embedded subprocess. The difference being that - unlike a normal embedded subprocess - the refined activity keeps the icon of the main activity.

To show this difference take the next example where I hide the details of a Script activity being executed before a User activity is scheduled. When I collapse that embedded subprocess it gets a blue color, hiding this technical detail but also that the main activity (still) is the User activity.



This can somewhat be mitigated by changing the icon of the activity, but the options are pretty limited. Furthermore, this deviates from the standard BPMN notation what some readers might find somewhat disruptive.


Now let's have a look at the refine feature. The use case here is a bit different, in that I want to hide from the reader that a User activity in reality is handled by some other application with some asynchronous interface to send the payload (to of what otherwise would be a normal Human Task) via a Send activity, after which I receive the updated payload and outcome via a Receive activity. In case you wonder why on earth I want to do this: the example is inspired by a real customer case where the BPM process orchestrates system and human interactions of which the latter actually are backed by activities in Siebel.

You refine an activity by chosing "Refine" from the right-mouse-click context menu of the activity itself.


The initial result is some sort of an embedded subprocess to which a User activity has automatically been added, however without a Start and End event.


I can now detail this activity by adding a Send and Receive activity to it. Because I don't wamt implement the User activity I put that in draft mode. Before you criticize how ugly this is, consider this: you still may want to express that the Send and Receive actually are a placeholder of something that is not implemented as a Human Task, but still concerns some implementation of what logically is a User activity.


I can compile and deploy this BPM application without any issue, but ... As it turns out it does not work.


Because of what I consider a bug, the refined activity actually does need a Start and End event, just like a regular Embedded Subprocess. The compiler just forgets to tell you.




Not surprising, as you can see the flow trace is not different than that of a regular Embedded Subprocess. And what you can do with it is also the same, as you can tell from the next iteration in which I have implemented some fallback scenario to schedule a User activity whenever the handling by the other application is not done within some time limit.


And despite all these details, I can still present the activity to the reader as a simple User activity, only difference being the + symbol :-)

Wednesday, March 01, 2017

Are MicroServices the Death of BPM and Case Management?

When reading about MicroServices you could get the impression that orchestrated business processes or even case management applications will soon become legacy. I seriously doubt that, considering the challenges you will face with creating a landscape of MicroServices that will be able to support some of the characteristics that gave birth to BPM and Case Management in the first place. Also, Martin Fowler's primary guideline concerning MicroServices is "don't even consider MicroServices unless you have a system that's too complex to manage as a monolith". In the following I discuss the issues you might face with Business Process and Case Management in a pure MicroServices architecture. My conclusion being that MicroServices will not be the death of BPMN or Case Management. On the contrary, it probably is going to help delivering on some of their promises we so far seem not always be able to deliver upon.

Update 23-03-2017: you may also be interested to learn that Netflix (one of the examples you will always find when people point to a successful MicroService implementation) found the need for a Netflix Conductor: a microservices orchestrator.

Business Processes and Cases Are Not MicroServices

Let's face it, BPM is about (stateful) orchestration. MicroServices are supposed to be stateless, and its business capability should not depend on others to complete its work, which makes it like the opposite. In BPMN the order in which activities are executed is prescribed or 'orchestrated' as we say, by 'flows' that go from one point to another. The de facto standard language to express a BPM processes is BPMN, which visualizes this explicitly. With each step the state of the complete flow can be persisted. Service calls should be synchronous when successful completion of the process is dependent on the response, and then errors are handled by the process. In contrast the MicroServices 'design for failure' principle makes them more about 'choreography' and as loosely coupled as possible. Rather than making the working of a MicroService dependent on a synchronous call to another service, communication preferably is based on events. By definition there is no such thing as persisting the 'state of a process', and no over-arching process to handle errors.

Unlike BPMN, Case Management is about choreography, but - much more than a number of interacting MicroServices - still predictable in that you know up-front which type activities may be involved, and the rules that determine this. Similar to BPMN, with CMMN you can visualize this to some extent. And similar to BPM also the state of a case is persisted, supporting that you can see what has been done by whom, what the current running activities are, and - based on the model and the rules - you can predict what might happen next. A successful completion of a case depends upon the completion of the individual activities. So in spite of its characteristic of choreography also Case Management contrasts MicroServices in more than one way.

MicroService Challenges

When thinking about the highly flexible, however for the observer often unpredictable flow of events in case of a MicroServices architecture, where the completion of an instance of one MicroService can trigger any number of instances of other MicroServices, you start to realize some of the challenges you will face with business processes that are only supported by MicroServices including - but not limited to - the following.

Process/Case Introspection

As stated before, one thing a business process and case management support is that you can introspect the state of the process or case. Where is it, what has already happened, and what will/might happen next? To achieve the same with MicroServices you will have to realize some central, coordinating MicroService or Aggregator that somehow has to be fed with the state of MicroService executions, can correlate them in some way, and present them in a context that can be understood by the user. For example, in case of a complex order handling business process (that can span hours our days) this implies that it is able to correlate MicroService executions using some common business indicator like an order id. This implies a dependency of this central MicroService on the other ones to publish the states of their execution with a reference to the order id. That introduces some interesting challenges regarding how to define the bounded context of such a central MicroService and how to implement the anti-corruption layer to make the entities of the individual MicroServices non-intrusive to that of the central one.

But let's ignore that for now. For this central MicroService to be able to present this state to the user so that he/she understands what happened when, why, by whom or what, and what might happen next, it must have some notion of a 'business process' (or case). It might be my lack of imagination, but I cannot picture how this can work as there is no central coordinator to rule them all. A concrete example from my practice is a Move Natural Person process in a bank. Next to a bank account this person might also have a credit card, a mortgage, and several insurances. Some of these product can be moved by just changing the address, but you cannot do that with a mortgage for example. For a bank moving a person or organization is one of the more complex processes, and whenever a customer calls to inquire what the status is, it is imperative for the bank employee to have this overall view. How to know that all relevant MicroServices have been initiated? Of course, I can picture some solution where all MicroServices have to publish events to some central "hub" and from there support some navigation to dashboards of the individual MicroServices, But I also start to see some sort of a dependency that you would try to avoid in a MicroServices architecture.

Process/Case Operation

Operations will have a similar problem as the business has when they have to operate the process or case. If a process is stuck from a technical perspective, in which MicroService is that? Practically also this type of concern can only be addressed when to some extend there is a sort of common way to log errors, collect those and present them in a consolidated way. Also something that is in conflict with the principle of decentralization, as each MicroService is supposed to be operated independently.

Process/Case Modeling and Testing

And what about modeling and testing a process or case? Capturing how a case may evolve over time in CMMN is already more difficult for the reader to understand than a BPMN process design. But how a process would unfold in a pure MicroServices environment you can only understand if you would model that in some similar way. But in a pure MicroServices architecture that does not seem to make any sense. And if you don't model it you surely will have difficulties testing it.

Authorization & Authentication

Another challenge I would like to point out is authorization and authentication. In BPMN there are swimlanes that correspond to roles that you can assign people to. By using a central repository of these roles you can implement a consistent way of authentication and authorization. In Case Management there are similar concepts (e.g. knowledge workers). How to implement this for a process only consisting of MicroServices when this implies a centralized authentication and authorization model?

Granted, MicroServices is relatively new, still in the hype phase, and over time some of these challenges will be addressed. This will result in new patterns, and frameworks and tools to support that. But I seriously doubt this will ever address all the requirements that are naturally addressed by BPM or Case Management. So over time I believe both will survive the MicroServices hype, although I see Case Management gaining ground over BPM.

MicroServices Values for BPM and Case Management

However, all this does not mean there is no value in adapting at least some of the principles related to MicroServices to BPM and Case Management applications. I can see how it could address some of the issues I faced with processes that are almost too big to handle, and issues with reuse of services and the impact that had on agility. Since then I much more tend to:
  • Design and implement sub-processes as deployable units of their own.
  • Push more of the other logic to a deployable unit of its own than I already did.
  • Let data models be less intrusive to integrations (i.e. chose the Anti-Corruption pattern with small Bounded Contexts over the Conformist pattern), and address data mapping challenges in the (anti-corruption layer of the) individual services rather than in some integration layer (smart endpoints / dumb pipes).
  • Apply the Tolerant Reader pattern more that I already did
  • Copy and paste code if that prevents unnecessary impact of a change on some shared component.
And where useful and possible one can implement the services consumed by the business process or case as MicroServices and make the process and these services more loosely coupled. But that I already did. The mantra of 'do one thing and do it well' specifically appeals to me. I always try to prevent creating any service (or Java class for that matter) for which I have to use the word "and" to describe what it does.

Wednesday, February 01, 2017

Oracle SOA/BPM 12c: Contract WSDL Only in MDS?

In this posting I will discuss if it is a good idea to only have a (contract) WSDL in the MDS, and let your implementing composite point to that, instead of having a WSDL in the project itself (as well).

When developing SCA composites with JDeveloper, initially your WSDL will be in the project of the composite. Some people put a contract WSDL in the MDS, and then let the code of the SCA composite point to that (using an "oramds:/" reference), while removing the local one at the same time. The idea behind this that all projects using it, including the service provider itself, use exactly the same WSDL, and with that prevent conflicts. Good thinking but this is what you should know before doing so.

First the good news. 11g required that there was also a local WSDL. If you moved the WSDL to the MDS it would generate a wrapper WSDL that would be used in the code itself. The wrapper would then import the contract WDSL. Apart from the feeling that these wrappers seem to add overhead, I also experienced with some versions of JDeveloper that the wrapper and the contract WSDL could get out-of-sync. Sometimes fixing compilation issues because of that could become a difficult job indeed. With 12c this has improved. The wrapper WSDL is no longer generated, and so far I have not been able to reproduce any of the synchronization issues caused by changes in the project.

However, you may have some issues with control over publishing any update of the contract WSDL. If you need to update the WSDL, for example because you want to add an operation (not applicable to BPEL by the way) you have to change the contract WSDL in the MDS first. If you commit that to your version control system then somebody else could update it, and get the impression that the new operation is ready to use, while you still have to start implementing it.

There are two ways to work around it.

One option is to work with an MDS project that is specific to your composite. Meaning that, instead of 1 single project that you use to deploy all MDS artifacts at the same time, you create a small-scoped MDS project that contains all artifacts for one specific composite. That MDS project you add to the same workspace as the composite itself. So you can change the contract WSDL and work on the implementation without hindering anyone else until they need it. But then they get the new contract WSDL together with the updated composite itself.

Of course this option won't work when you share one single development environment, but that is a bad practice anyway. It also may require a change of the way you deploy the MDS and composites. Using tooling like Maven can help out here.

If this option does not work for you, then consider having a local as well as a contract WSDL. You change the local WSDL first, implement the new operation, and only after you are ready replace the contract WSDL.

In either case it is highly recommended that all public schemas are in the MDS only, before releasing the composite for usage. Otherwise you may encounter runtime issues with clashing element definitions, which you may only encounter when it is too late, for example after a restart of the server.

Friday, August 12, 2016

Oracle BPM 11g/12c: How to Catch an Event in the Same Process

A customer of mine was kind of surprised that when you throw an event in a component of a SCA composite, that the same component cannot catch that event and act upon it. This is a known limitation, for which there is a work-around, which I will discuss in this article.

The work-around is quite simple: another loosely coupled component in the same composite can listen to the event, so all you have to do is to create a BPEL or BPM process as-a-service that is subscribed to the event, and that interacts with the main process that you want to act upon it.

To show that a component cannot listen to its own event, and that the work-around actually works, I used the following test process. No worries, it looks more complex than it is.

The parent process above takes a parameter as input so that I can let it execute either one of the following three scenarios, which consist of throwing an event and then catch it:

  1. In the same (parent) process model
  2. In a reusable sub-process (called through a Call activity)
  3. In a process as-a-service that is called through a Send / Receive activity


There are 4 parallel flows between the OR-gateways:

  • The top flow has a Wait User activity to make it pause and waits for the event.
  • The second flow has a Call Child Call activity which calls the reusable process below.
  • The third flow has the Send/Receive activities to call the process as-a-service below.
  • The bottom one waits 2 seconds to give one of the other flows time to be activated, and then throws either one of two events, depending on whether I want to test catching it in the parent or in the reusable child (for this you cannot use the same event type, that's why).

Only 1 of the first 3 flows is activated at any time, while the last flow (with the events) is always activated. Furthermore the parent process has an Event Sub-process that listens to the event that is thrown by the Throw Internal Event event.

The reusable child is also very basic. It has a User activity to make it pause and wait for the event. It also has a an Event Sub-process that listens to the event that is thrown by the Throw Internal Event for Child event. If it is activated, it will map some variable to itself (to see something concrete in the audit trail), and then it will withdraw the Wait task.
The (child) process as-a-service does the same as the reusable child, except for that it has a start and end event which makes it an asynchronous BPM process as-a-service.
Now when you start an instance of the parent for each of the 3 scenarios, the result in Enterprise Manager is as below:
The instance at the bottom (1450067) belongs to the scenario where the parent tries to catch the event. Which fails as you can see by the fact that it is still running. And yes, I did make sure the Catch Event is correlated properly to the Start Event. The next instance (1450068) is the one that catches it, but as you can see they both are still running. When clicking on the second one, it somehow figured out that both instances are related, but the first instance won't act upon it.

The third instance (1450069) is that of the scenario where the reusable child tries to catch the event. From the fact that there is no other instance, you can see that it does not even listen to the event.

The fourth instance (1450070) is that of the parent that calls the child process as-a-service. The fifth (top) instance (1450071) is that of the child that catches the event, and then calls back the parent instance. As you can see, those are the only two instances that actually completed. So only in this scenario it actually works.

Friday, July 01, 2016

Oracle SOA: Using Sensors on Optional Elements

In this posting I describe an issue you may run into when configuring composite sensors on optional elements, and why it is good practice to always add a filter that checks if the element is actually present.

If you define a sensor on a composite to record elements that are optional, you may find an error in the logs similar to the below:


If that is the case you probably have a composite that takes a request with one or more optional elements, of which one or more are not provided, which then is transformed by an XSLT. I have only seen it in combination with a Mediator, but don't know if it would happen in case of BPEL or BPM as well.

For some reason (a product limitation, if not a bug) it will try to store the sensor with a null value, which will result in the above error. You will not see the error when the input is being mapped using XPath, as then it will not try to store the sensor.

To make it always work, the solution is to add a filter on the value to make it only store the value if the element is actually there. I will now explain how to do that.

Let's assume you have a payload like this:


If you want a sensor on the secondElement, you can add a sensor by right-mouse clicking the service and choose Configure Sensors:


For this optional element you should configure it as follows:


It may look a bit like overkill to always do it, but then again if you do it right away, you no longer have to worry about it later, as it will always work, XSLT or not. 

Wednesday, June 29, 2016

Why in Oracle BPM/SOA Suite Attaching a Fault Policy to a Synchronous Service Is Not a Good Idea

In this posting I will explain why in the Oracle BPM/SOA Suite you should not attach fault policies to synchronous services.

This posting has been updated on July 1 2016 to explain why the recovery of ServiceA failed.

The other day I investigated some BPM process instance that had an unrecoverable error. It was calling a synchronous service, that on its turn was calling another synchronous service, that on its turn was calling a external, synchronous service exposed through the Oracle Service Bus. That latter call failed (due to a timeout). As all the composites were having a fault policy attached to them, it was expected that the instance was recoverable. Instead there was some JTA transaction error that rolled back all the way up to last dehydration point in the (top level) BPM process, and from there was retried two times before it finally gave up and went into a coma. Big surprise!

What I recommended to them to prevent this in the future, was the following:

  1. For all synchronous services detach the fault policy. Only attach fault policies to asynchronous and fire&forget services,
  2. Where possible, do asynchronous calls from the BPM process (instead of synchronous), 
  3. Wherever possible make all synchronous services idempotent or make them asynchronous / fire&forget.

If you want to understand why, read on!

Let's assume we have the following chain of services:


I created a FailingService that I can let succeed or fail depending on the input.

Now what happens when BPMProcess calls ServicePS with request 'fail' is the following:

  1. ServiceA errors because of the fault thrown by FailingService
  2. BPMProcess and ServicePS both error because of a timeout
  3. Because of fault policy attached to all them, all go into recoverable state (human intervention)



As ServiceA is in a recoverable state, why not try to recover and see if that will fix the flow?

Now what happens when a retry is done from ServiceA with payload 'normal' is the following:

  1. ServiceA completes successfully
  2. However, BPMProcess & ServicePS are still in recoverable state


The explanation for this is that, because ServicePS was already in a recoverable state, it will not receive the response from ServiceA, as it is no longer listening.

Now let's see what happens when we try to recover the instance of ProcessPS and change the payload from 'fail' to 'normal'


  1. Although the payload was changed to 'normal', we still end up with a new errored instance of ServiceA, as the request of the call from A to FailingService did not change with it (i.e. still 'fail')



In the meantime I understand why ServiceA still called the FailingService with payload 'fail' (I picked the wrong call to retry from the drop-down), but even if the call would have been successful, the FailingService would have been called twice, and let's just hope it is idempotent!

To prevent we get more of these duplicate calls, we recover the (top level) BPM instance instead.

Now what happens when a retry is done from BPMProces with payload 'normal' is the following:

  1. There are new (successful) calls to ServicePS -> ServiceA -> FailingService
  2. However, there are still running instances of ServicePS and ServiceA (they are still in a recoverable state)



The explanation being that these instances were still running after the previous (failed) attempt. So now we still have to abort these running instances to prevent duplicate calls. All in all not very convenient.

The solution is to never let an asynchronous service use a fault policy that either initiates human intervention or does 1 or more retries. The point being that in the meantime the consumer will have timed out, and never receive the response even it succeeds later on.

The best layer to handle errors with synchronous services is a layer that has 'knowledge' about the context of the process. Normally that is the business process itself. The reason being that a policy that fits one process may not fit another.

On the other hand, there are some good arguments for not letting system errors bubble all the way up to a business process. Instead you should consider handling it in the next layer below it - in this case being ServicePS - by making all calls from the business process to ServicePS either asynchronous, or fire&forget (the latter when successful continuation of the process is not depending upon the call). ServicePS will then handles the error using fault policies. You have two options to recover:

  • You recover the instance of ProcessPS, or (when that fails for whatever reason) 
  • Abort the instance of ProcessPS, and do an alter flow on the business process by moving the token from the Receive back to the Send activity. 

As a matter of fact, this customer actually created this ServicePS as a process-specific layer that sits in between the business process and any other service. A similar layer may not be feasible in your case, in which the solution would be to let the error bubble all the way up to the process instance and handle it there (using fault policies).

Thursday, May 12, 2016

Oracle BPM 12c: Browsing the SOAINFRA

In this article I discuss some tables from the SOAINFRA schema that might be most interesting to use when trying to find out why you don't see in Enterprise Manager what you expect.

Going from 11g to 12c, some things have significantly changed in the SOAINFRA schema. For example, your normal partners in helping with "what happened with my process?" type of queries, like the component_instance, and bpm_process tables, have become obsolete. On the other hand you have new friends with tables like sca_flow_instance, and sca_entity.

The following discusses some tables that you might want to look into when digging in the dirt of the SOA/BPM engine's intestines.

The tables I would like to discuss in more detail are:
- sca_flow_instance
- cube_instance
- wftask
- sca_entity
- bpm_cube_process
- bpm_cube_activity

Given that there is no official documentation on these tables, this is based on my observations an interpretations. No guarantee that these are flawless, so if you have anything to improve or add, let me know!

To better understand the data in the SOAINFRA in relation to an actual process, I used 1 composite with the following processes, that has two subprocesses (another BPM process and a BPEL process). The BPM subprocess has not been implemented as a reusable process (with a Call activity) but instead as a process-as-a-service.






As a side note: originally I created this process to be able to verify how the different states a process and its children can have, are represented in Enterprise Manager. The reason being that on one of my projects there were some doubts if this is always correct, given some issues in the past with 11g. With 12c I could find none. However, as the test case does not concern inter-composite interaction, nor does it include all types of technologies, you could argue that the test case is too limited to conclude anything from it. Also worth to mention is that the instances are ran on a server in development mode, and without in-memory optimization. I have heard rumors that you will observer different behavior when you disabled auditing completely. In some next posting I hope to discuss that as well.

I initiated several instances, for each possible state one:


sca_flow_instance

As the name already suggests, this table contains 1 entry for each flow instance. You might be interested in the following columns:
  •   flow_id
  •   title
  •   active_component_instances
  •   recoverable_faults
  •   created_time
  •   updated_time

When queried this looks similar to this:

The query used is like this:

select sfi.flow_id
,      sfi.title
,      sfi.active_component_instances
,      sfi.recoverable_faults
,      sfi.created_time
,      sfi.updated_time
from  sca_flow_instance sfi
order by sfi.created_time

cube_instance

This table contains 1 entry for each component instance in the flow (e.g. bpmn, bpel). You might be interested in the following columns:
  • flow_id
  • composite_label (*)
  • cpst_inst_created_time (**)
  • composite_name
  • composite_revision
  • component_name
  • componenttype
  • state (of the component <== mention)
  • creation_date (incl time)
  • modify_date (incl time)
  • conversation_id

(*) corresponds with the bpm_cube_process.scalabel
(**) equals sca_flow_instance.created_time

When queried this looks similar to this:

The query used is like this:

select cis.flow_id
,      cis.componenttype
,      cis.component_name
,      cis.state
from   cube_instance cis
order by cis.flow_id


wftask


This table contains an entry for each open process activity and open or closed human activity. You might be interested in the following columns:
  • flow_id
  • instanceid
  • processname
  • accesskey (not for human tasks) (*)
  • createddate
  • updateddate
  • (only in case of human tasks, the flex fields)
  • componentname
  • compositename (not for human tasks)
  • conversationid
  • componenttype (***)
  • activityname
  • activityid (****)
  • component_instance_id (only for human tasks)
  • state (*****)

(*) : the type of activity, e.g. USER_TASK, INCLUSIVE_GATEWAY, END_EVENT
(**) not for human tasks
(***) e.g. Workflow, BPMN
(****) Corresponds with the activityid of bpm_cube_activity. The user activity and its corresponding human task appear to have the same activityid. After the human task is completed, the user activity disappears but the human task is kept with an null state.
(*****) e.g. OPEN for running activities, ASSIGNED for running human tasks. Other states are ABORTED, PENDING_MIGRATION_SUSPENDED, ERRORED, etc.

When queried this looks similar to this:


The query used is like this:

select wft.instanceid
,      wft.processname
,      wft.accesskey
,      wft.createddate
,      wft.updateddate
,      wft.componentname
,      wft.compositename
,      wft.conversationid
,      wft.componenttype
,      wft.activityname
,      wft.activityid
,      wft.component_instance_id
,      wft.state
from   wftask wft
where  wft.flow_id = 130001
order by wft.updateddate

sca_entity

This table contains an entry for each SCA entity (e.g. service, wire). The following column might be of use:
  •  id
  •  composite (name)
  •  label (corresponds with the scalabel of bpm_cube_process)

When queried this looks similar to this:


The query used is like this:

select sen.composite
,      sen.id
,      sen.label
from   sca_entity sen
where  sen.composite = 'FlowState'
order by sen.composite

bpm_cube_process

This table contains metadata. For each deployed composite it contains an entry for each BPM process. If 2 BPM processes in once composite: 2 entries. The following columns might be of use:
  • domainname
  • compositename
  • revision
  • processid
  • processname
  • scalabel
  • compositedn
  • creationdate  (incl time)
  • undeploydate
  • migrationstatus (*)
(*) Values are LATEST, MIGRATED.

When queried this looks similar to this:



The query used is like this:


select bcp.domainname
,      bcp.compositename
,      bcp.revision
,      bcp.processname
,      bcp.processid
,      bcp.scalabel
,      bcp.compositedn
,      bcp.creationdate
,      bcp.undeploydate
,      bcp.migrationstatus
from   bpm_cube_process bcp
where  bcp.compositename = 'FlowState'
order by bcp.processname
,        bcp.creationdate


bpm_cube_activity

This table contains metadata, There is an entry for each individual activity, event, and gateway of a bpmn process. The following column might be of use:
  • processid (corresponds with the bpm_cube_process.processid)
  • activityid
  • activityname (technical, internal name can be found in the .bpmn source)
  • activitytype (e.g. START_EVENT, SCRIPT_TASK, CALL_ACTIVITY, etc.)
  • label (name as in the BPMN diagram)
The rows in the example below have been queried by a join with the bpm_cube_process table on processid, where undeploydate is not null and migrationstatus is 'LATEST' to get only the activities of the last revision of one particular process:


The query used is like this:

select cbi.flow_id
,      cbi.composite_label
,      cbi.cpst_inst_created_time
,      cbi.composite_name
,      cbi.composite_revision
,      cbi.component_name
,      cbi.componenttype
,      cbi.state
,      cbi.creation_date
,      cbi.modify_date
,      cbi.conversation_id
from   cube_instance cbi
order by cbi.creation_date

Obsolete Tables

The following table have become obsolete:
  • bpm_activity
  • bpm_activity_instance
  • bpm_cube_activity_instance
  • bpm_process
  • component_instance
The composite_instance is still used, but more or less superseded by the sca_flow_instance (although the number of instances are not the same). I do not longer find it useful to query.