Wednesday, March 29, 2017

Oracle BPM: Time for Time Out (2)

In a previous blog posting I discussed a solution to re-initiate a scope in BPMN that is supposed to time out after some time. In this posting I discuss how that solution inspires a couple of other use cases where a time out has to be re-initiated by calling an operation on the process.

In the following process model there are three flow, for three different use cases to re-initiate the time-out of:

  1. A process instance (top flow),
  2. An (asynchronous) Receive activity (middle flow),
  3. A User activity (bottom flow).

Re-initiate Timer for Process Instance

The trick here is to use an Event Based gateway that either fires when the time-out occurs, or responds to the call to the re-initiation operation (Reinitiate Requested in the picture) which passes on a new duration. The Timeout Event Gateway is started again, whereby the the new duration is used to (re)schedule the Time Out timer. The reinitiate Gateway is necessary to loop back, and is the default. The condition of the no flow is "false".

The following picture shows the flow when that happens.

Re-initiate Timer for Receive Activity

The re-initiation of the Receive activity happens through a Boundary Message event. The dummy Gateway does not do anything but is necessary to loop back to. The Receive is then rescheduled with a timer that has a new duration as passed on through the call.

The following picture shows the flow when that happens.

Re-initiate Timer for a User Activity

In the previous two examples, the timer is completely (re)scheduled with the passed-on duration. In the bottom example the time-out of the User activity happens by setting the expiration on the Human Task. This is the recommended way as it will make the expiration visible in Workspace, and make sure the Human Workflow Engine properly cleans up the Human Task (which was not always the case in previous releases of the Oracle BPM Suite).

What happens in this scenario is that the expiry is actually not re-initiated but instead paused for a while using an Update activity with operation "Suspend Timers", then wait, and then continue the timer using an Update activity with operation "Resume Timers". This construction allows usage of an (non-interrupting) Event Subprocess, which has the advantage that it does not clutter the rest of the process model, you keep the same Human Task instance (with the same taskId) plus, if you have multiple Human Tasks at the same time, you can also use this construction to suspend other user activities as well.

The following picture shows the flow when that happens.

If you want to re-initiate the timer in a similar way as in the previous two use cases, then you can use the second solution with a Boundary Timer event and a Boundary Message event. The result will be that the Human Task is actually aborted (as said not in some older 11g versions), and then a new instance is created (with a new taskId!). Depending on your process model you can also put the User activity in a scope of its own, and re-initiate the timer of that as described in the previous posting on this topic.

Friday, March 24, 2017

Oracle Weblogic: Tackling Class Loading Issues for SOA Infra

This blog article discusses how to address class loading issues with the Oracle SOA Infra. It's prime "raison d'etre" being a memory dump of something I don't do often, but may spend significant time in finding out how to do it again.

Some time ago I lost valuable time because some library being deployed twice, once in the wrong place ([SOA_HOME]/lib folder) and once in the right place ([SOA_HOME]/soa/modules/oracle.soa.ext_11.1.1). In this particular case the first was wrong because the library was using classes that were only loaded when the SOA infrastructure was initialized.

I had created a composite that relied upon some code from the jar, which I knew should be there, but every time it was called it gave me a NoSuchMethodError. A nasty problem because deployment of the jar file was not done by me, but instead by some Operations department that I could only contact indirectly, and any request could easily take a day to get resolved. Of course I blamed these stupid people from Operations that did not even know how to deploy a jar file properly, and undoubtedly Operations was blaming this idiot calling himself a developer but did not know how to code straight. Polite as we both are, we did not say so to each other of course. Me giving you this anecdote only to point out one of disadvantages of not doing DevOps ;-)

But then came the WebLogic Classloader Analysis Tool (or CAT for short) to the rescue. With that I was able to determine that my jar was loaded from both the lib folder as well as the oracle.soa_ext_11.1.1 but as the first one has preference over the seconds one, my composite always went to the old lib, even though Operations did deploy the latest version to the proper location, So somewhere early in the process Operations did deploy it in the wrong location (ha!), but then again at the time I probably did not give them proper instructions about its location either (hmm...).

There already is enough information to be found about the Classloader Analysis Tool, including this one, so I just will stick to explaining how I found out to find out what is being loaded from the lib folder of the SOA Server and what from the oracle.soa_ext_11.1.1 folder.

To go to CAT use a URL like this; http://[server]:[port]/wls-cat. Make sure you go to the SOA Server, and not the Admin Server (unless that is one and the same). Any class loaded by the SOA infra you can find from soa-infra -> soa-infra -> View: detailed -> Classloader Tree. The jars from the lib folder are loaded by the whereas the SOA infra itself (including the external jars) are loaded by the weblogic.utils.classloaders.GenericClassLoader.

Wednesday, March 22, 2017

Oracle BPM: Time for Time Out (1)

In this posting I describe how to time out a specific BPM scope with the option to re-initiate the timer.

In case you need to model a time out for a specific scope within a process where you want to be able to modify the time out run-time, then you can model it similar to this:

A parallel flow is used where the top flow covers the main process, and the bottom flow handles the timeout. To make the timeout configurable, the bottom flow uses an Event Gateway with a Message event to interrupt the timer and re-initiate it again. The first of the two flows that reaches the Complex Merge aborts the other one (first come, first served), as configured in the Complex Merge:

Note: If you want re-initiation to happen based on a Signal, than you cannot use that in an Event Gateway. However, as a work-around you can define a separate component in the composite that is subscribed to the Signal event, and then calls the "Reinitiation Requested" Message Start event.

Time Out Flow

The timer is configured using an expression that results in a duration:

Furthermore you need some variable that is initiated in the Start operation as false, e.g. called a "mainProcessTimesOut":

"mainProcessTimeOut" is set to true in the "Set Timed Out" Script activity, and used in the "timed out?" Exclusive Gateway to go to the "End" or "Timed Out" End event.

Reinitiate Flow

The "Reinitiation Requested" Message Catch event exposes a "reinitiateTimer" operation that takes the new expiry duration as input, plus an id to correlate the instance:

As the "Reinitiation Requested" Message Catch is only activated in case re-initialization of the timer is requested, the condition of the no-flow from "reinitiate?" can simply be set to false, and the yes-flow as the default.

In a follow-up article I will discuss some more patterns for timing out with re-initiation.

Tuesday, March 21, 2017

Oracle BPM: Hiding Faults from BPM? Don't use Service Activity!

In the following I explain how you can hide faults from BPM by not using (synchronous) Service activities, but (asynchronous) Send/Receive activities instead.

When calling services from a BPM process, you should think about where you want faults to show up and to be handled. This is specifically of interest when you have some integration layer between your BPM processes and external services that you call to abstract the external services from the BPM process. Let's call this layer the Service Layer. I have seen such a layer in various formats, ranging from a Reusable Subprocess, a BPEL process in the same composite as the BPM process, or a BPEL process in a separate composite, or instead of BPEL a Mediator. You may have such a layer to hide technical details from the business process, to cover some sort of custom exception handling, or to hide the message format from these external services from the BPM process (or a combination of all that). The latter might be because you don't have the luxury to do message transformation in a service bus.

In case the BPM process calls the Service Layer through a (synchronous) Service activity and that fails, then this will result in the main BPM instance to get into an errored state, and you will have to handle the error in the BPM process. This behavior might be exactly what you wanted to prevent with the Service Layer, for example because the Service call is in a parallel flow and you want to be sure that the fault does not impact processing of the other, parallel threads.

The following example shows what happens. It concerns a main BPM process, that calls synchronous ServicePS from the Service Layer, which on its turn calls some other ServiceA that (finally) calls a FailingService that always fails. The example is a bit over complicated because I configured a fault policy in the synchronous services. You may be aware that I wrote some other article explaining that this is not a good practice, but when creating this example I did not had that insight yet ;-) So bear with me and just ignore these synchronous services still being in a "Running" state after they failed.

The following shows the synchronous BPEL of the ServicePS.

Because the whole chains of calls is synchronous from beginning to the end, you will see that all synchronous services have the "Faulted" state. Because of the fault policy in the BPM (the only one that makes sense in this case) it is still running, but because the fault bubbled up to the BPM instance that shows the error as well.

Now lets refactor this to a solution where the Service Layer will hide the fault from the BPM process. To do so, all calls from the BPM process to the Service Layer will have to be asynchronous.

The following shows the asynchronous BPEL of ServiceAsyncPS_NP. 

Learning from my earlier mistake with the fault policy, this asynchronous service now is the only one in the chain with a fault policy. Because the FailingService failed, also the (synchronous) ServiceA_NP failed. But because ServicePSAsync_PS is asynchronous, that is where it stopped.

The error can be recovered from there, and in the meantime, the BPM process runs like there is no cloud in the sky.

Because of the asynchronous nature of the ServiceLayer, this is not a decision you should take lightly. For example, statefull BPEL cannot be migrated, so any error in it cannot be fixed for running instances. It therefore might not be the silver bullet you were looking for.

Friday, March 17, 2017

Oracle BPM: Loops and Gateway Struggles

If there is one issue that I see people often struggle with, then it is the use of loops in combination with gateways. The following discusses a few cases.

The following picture shows several loops in combination with a Parallel gateway, of which some are valid and some not. The same holds for the Inclusive gateway.

To understand why some loops are valid and other not, you have to realize that at the beginning of a Parallel or Inclusive gateway as many tokens are generated as there are parallel flows that run between the start and end of the gateway. To the BPM engine this translates to 1 or more threads that are instantiated.

No such restrictions are there for an exclusive gateway, because then there is only one token (thread) active at any time.

So in BPMN the following flows are not valid:
  • From "crossover?", because you are going to another thread that may already have passed the point that the flow goes to. However, JDeveloper does not prevent you from doing so.
  • From "loop back inside to beginning", because at the beginning of the gateway new threads would have to be instantiated for flows of which some threads may already run. JDeveloper should fail validation of such a construct.
  • From "loop back inside from outside", because you would then have to go back to a thread already ended in the merge. JDeveloper should fail validation of such a construct.

The flows that are valid in BPMN are:
  • From "loop back inside", as you loop back within the same thread.
  • From "loop back outside to beginning" as you are re-instantiating a new set of threads for which the previous set already ended.

In case the latter does not work apply patch 23230734.

Wednesday, March 08, 2017

Oracle BPM 12c: Hide Implementation Details with the Refine Feature

Ever had a case with the Oracle BPM Suite where you wanted to create a BPMN model while hiding the details from the reader? Then the "refine" feature may be what you are looking for. Read on if you want to know more about this feature that has been added since 12c. I actually blogged about it before, but this time I want to also illustrate the impact it has on the flow trace.

The "refine" feature is a way to detail an activity.  Basically it is a specialization of the (already in 11g present) embedded subprocess. The difference being that - unlike a normal embedded subprocess - the refined activity keeps the icon of the main activity.

To show this difference take the next example where I hide the details of a Script activity being executed before a User activity is scheduled. When I collapse that embedded subprocess it gets a blue color, hiding this technical detail but also that the main activity (still) is the User activity.

This can somewhat be mitigated by changing the icon of the activity, but the options are pretty limited. Furthermore, this deviates from the standard BPMN notation what some readers might find somewhat disruptive.

Now let's have a look at the refine feature. The use case here is a bit different, in that I want to hide from the reader that a User activity in reality is handled by some other application with some asynchronous interface to send the payload (to of what otherwise would be a normal Human Task) via a Send activity, after which I receive the updated payload and outcome via a Receive activity. In case you wonder why on earth I want to do this: the example is inspired by a real customer case where the BPM process orchestrates system and human interactions of which the latter actually are backed by activities in Siebel.

You refine an activity by chosing "Refine" from the right-mouse-click context menu of the activity itself.

The initial result is some sort of an embedded subprocess to which a User activity has automatically been added, however without a Start and End event.

I can now detail this activity by adding a Send and Receive activity to it. Because I don't wamt implement the User activity I put that in draft mode. Before you criticize how ugly this is, consider this: you still may want to express that the Send and Receive actually are a placeholder of something that is not implemented as a Human Task, but still concerns some implementation of what logically is a User activity.

I can compile and deploy this BPM application without any issue, but ... As it turns out it does not work.

Because of what I consider a bug, the refined activity actually does need a Start and End event, just like a regular Embedded Subprocess. The compiler just forgets to tell you.

Not surprising, as you can see the flow trace is not different than that of a regular Embedded Subprocess. And what you can do with it is also the same, as you can tell from the next iteration in which I have implemented some fallback scenario to schedule a User activity whenever the handling by the other application is not done within some time limit.

And despite all these details, I can still present the activity to the reader as a simple User activity, only difference being the + symbol :-)

Wednesday, March 01, 2017

Are MicroServices the Death of BPM and Case Management?

When reading about MicroServices you could get the impression that orchestrated business processes or even case management applications will soon become legacy. I seriously doubt that, considering the challenges you will face with creating a landscape of MicroServices that will be able to support some of the characteristics that gave birth to BPM and Case Management in the first place. Also, Martin Fowler's primary guideline concerning MicroServices is "don't even consider MicroServices unless you have a system that's too complex to manage as a monolith". In the following I discuss the issues you might face with Business Process and Case Management in a pure MicroServices architecture. My conclusion being that MicroServices will not be the death of BPMN or Case Management. On the contrary, it probably is going to help delivering on some of their promises we so far seem not always be able to deliver upon.

Update 23-03-2017: you may also be interested to learn that Netflix (one of the examples you will always find when people point to a successful MicroService implementation) found the need for a Netflix Conductor: a microservices orchestrator.

Business Processes and Cases Are Not MicroServices

Let's face it, BPM is about (stateful) orchestration. MicroServices are supposed to be stateless, and its business capability should not depend on others to complete its work, which makes it like the opposite. In BPMN the order in which activities are executed is prescribed or 'orchestrated' as we say, by 'flows' that go from one point to another. The de facto standard language to express a BPM processes is BPMN, which visualizes this explicitly. With each step the state of the complete flow can be persisted. Service calls should be synchronous when successful completion of the process is dependent on the response, and then errors are handled by the process. In contrast the MicroServices 'design for failure' principle makes them more about 'choreography' and as loosely coupled as possible. Rather than making the working of a MicroService dependent on a synchronous call to another service, communication preferably is based on events. By definition there is no such thing as persisting the 'state of a process', and no over-arching process to handle errors.

Unlike BPMN, Case Management is about choreography, but - much more than a number of interacting MicroServices - still predictable in that you know up-front which type activities may be involved, and the rules that determine this. Similar to BPMN, with CMMN you can visualize this to some extent. And similar to BPM also the state of a case is persisted, supporting that you can see what has been done by whom, what the current running activities are, and - based on the model and the rules - you can predict what might happen next. A successful completion of a case depends upon the completion of the individual activities. So in spite of its characteristic of choreography also Case Management contrasts MicroServices in more than one way.

MicroService Challenges

When thinking about the highly flexible, however for the observer often unpredictable flow of events in case of a MicroServices architecture, where the completion of an instance of one MicroService can trigger any number of instances of other MicroServices, you start to realize some of the challenges you will face with business processes that are only supported by MicroServices including - but not limited to - the following.

Process/Case Introspection

As stated before, one thing a business process and case management support is that you can introspect the state of the process or case. Where is it, what has already happened, and what will/might happen next? To achieve the same with MicroServices you will have to realize some central, coordinating MicroService or Aggregator that somehow has to be fed with the state of MicroService executions, can correlate them in some way, and present them in a context that can be understood by the user. For example, in case of a complex order handling business process (that can span hours our days) this implies that it is able to correlate MicroService executions using some common business indicator like an order id. This implies a dependency of this central MicroService on the other ones to publish the states of their execution with a reference to the order id. That introduces some interesting challenges regarding how to define the bounded context of such a central MicroService and how to implement the anti-corruption layer to make the entities of the individual MicroServices non-intrusive to that of the central one.

But let's ignore that for now. For this central MicroService to be able to present this state to the user so that he/she understands what happened when, why, by whom or what, and what might happen next, it must have some notion of a 'business process' (or case). It might be my lack of imagination, but I cannot picture how this can work as there is no central coordinator to rule them all. A concrete example from my practice is a Move Natural Person process in a bank. Next to a bank account this person might also have a credit card, a mortgage, and several insurances. Some of these product can be moved by just changing the address, but you cannot do that with a mortgage for example. For a bank moving a person or organization is one of the more complex processes, and whenever a customer calls to inquire what the status is, it is imperative for the bank employee to have this overall view. How to know that all relevant MicroServices have been initiated? Of course, I can picture some solution where all MicroServices have to publish events to some central "hub" and from there support some navigation to dashboards of the individual MicroServices, But I also start to see some sort of a dependency that you would try to avoid in a MicroServices architecture.

Process/Case Operation

Operations will have a similar problem as the business has when they have to operate the process or case. If a process is stuck from a technical perspective, in which MicroService is that? Practically also this type of concern can only be addressed when to some extend there is a sort of common way to log errors, collect those and present them in a consolidated way. Also something that is in conflict with the principle of decentralization, as each MicroService is supposed to be operated independently.

Process/Case Modeling and Testing

And what about modeling and testing a process or case? Capturing how a case may evolve over time in CMMN is already more difficult for the reader to understand than a BPMN process design. But how a process would unfold in a pure MicroServices environment you can only understand if you would model that in some similar way. But in a pure MicroServices architecture that does not seem to make any sense. And if you don't model it you surely will have difficulties testing it.

Authorization & Authentication

Another challenge I would like to point out is authorization and authentication. In BPMN there are swimlanes that correspond to roles that you can assign people to. By using a central repository of these roles you can implement a consistent way of authentication and authorization. In Case Management there are similar concepts (e.g. knowledge workers). How to implement this for a process only consisting of MicroServices when this implies a centralized authentication and authorization model?

Granted, MicroServices is relatively new, still in the hype phase, and over time some of these challenges will be addressed. This will result in new patterns, and frameworks and tools to support that. But I seriously doubt this will ever address all the requirements that are naturally addressed by BPM or Case Management. So over time I believe both will survive the MicroServices hype, although I see Case Management gaining ground over BPM.

MicroServices Values for BPM and Case Management

However, all this does not mean there is no value in adapting at least some of the principles related to MicroServices to BPM and Case Management applications. I can see how it could address some of the issues I faced with processes that are almost too big to handle, and issues with reuse of services and the impact that had on agility. Since then I much more tend to:
  • Design and implement sub-processes as deployable units of their own.
  • Push more of the other logic to a deployable unit of its own than I already did.
  • Let data models be less intrusive to integrations (i.e. chose the Anti-Corruption pattern with small Bounded Contexts over the Conformist pattern), and address data mapping challenges in the (anti-corruption layer of the) individual services rather than in some integration layer (smart endpoints / dumb pipes).
  • Apply the Tolerant Reader pattern more that I already did
  • Copy and paste code if that prevents unnecessary impact of a change on some shared component.
And where useful and possible one can implement the services consumed by the business process or case as MicroServices and make the process and these services more loosely coupled. But that I already did. The mantra of 'do one thing and do it well' specifically appeals to me. I always try to prevent creating any service (or Java class for that matter) for which I have to use the word "and" to describe what it does.