Running Mediator instances issue

Running Mediator instances issue

We encountered an issue with one of our clients when the SOA Purge wasn’t being very effective due to the running mediator instances even though the rest of the flow trace had completed, This wasn’t an issue for business as such however in most cases caused them to fall out of the criteria for Purge due to the state in which these mediator instances were in.
This should not cause much of a problem to clients who have are low on volumes, however for any of the larger clients where the daily volumes runs into Millions this can be a big problem.
As always the first step to solving a problem lies in the identification of the root cause so we wrote the following query to identify any running Mediator instances when the composite itself has completed.
Query to be run on the SOA Infra DB (SOA 11g).
SELECT comp.*
FROM mediator_instance mi,
composite_instance comp,
cube_instance ci
WHERE mi.composite_instance_id = comp.id
AND mi.composite_creation_date > (sysdate – 12) — select the number of days you want to run this for
AND comp.CREATED_TIME > (sysdate – 12) — select the number of days you want to run this for
AND comp.state IN (1,3,9,11,17,19,21,23,25,27,29,31)
AND mi.component_state IN (1,2,8)
AND ci.cmpst_id = comp.id
AND ci.creation_date > (sysdate – 12)
AND ci.state IN (4,5,6,7,8,9,10)
ORDER BY comp.created_time;
For a list of what these states mean and stand for refer to my earlier post http://nitinaggarwal.wordpress.com/2013/06/12/soa-11g-soa-infra-db-states-for-soa-composites-and-components/
Once you have identified the various composites you can then look the details up using the EM or writing further DB queries.
In most cases the problems lies in the way the code is written and hence needs to be fixed.
In certain scenarios we noticed that if for instance a Bpel calls a mediator (sequentially) and there is a fault in that mediator calling a reference.
Due to the fault policies the bpel would re try as configured which would then initiate another call to the mediator and might recover in this case if the root cause is resolved.
As a result of the above the composite will be marked as completed however that did not update the status of the mediator instances and left them either running or one of the other non purgeable states.
There isn’t much you can do in this scenario as with SOA 11.1.1.4 there isn’t an option to abort specific components and you can’t abort a composite which is already completed.
Having speaking to my contacts I found out that there is another similar bug in SOA 11.1.1.7 product and a fix will be made available as part of early patches to the 11.1.1.7 release.
One work around for such a scenario is to Un-deploy or Re-deploy the composite which will mark all the instances as stale and thus make them eligible for purging.
For most other cases we implemented some code changes that got rid off any such occurrences.
Also implemented a lot of performance tuning changes to the engine settings which helped us a lot, but the details for those will come in a later post.

Post Tagged with , , , ,