There are several ways to implement loops in Kettle:
- To use "Execute for every input row" flag in job or transformation.
- To circle hops in a job.
- To use "Repeat" flag in Start step of the job.
Using "Execute for every input row" flag in job or transformation.
This is the most safe, correct and native way to implement loops in Kettle, but to use it you need to know in advance how many times you want to run the job inside a loop. If you can get or calculate the number of iterations in advance - just use this approach. For more information about it I can sent you to Slawomir Chodnicki blog article about it.
Circling hops in a job.
This approach is not safe. When using it keep in mind that loop depth can't be too big. If you broke this rule you risk to get StackOverflowError. That is because Kettle use recursive method calls when running this kind of jobs. So if you believe your loop will not exceed say 10,000 or 100,000 iterations, depending on StackSize settings in your JVM you can run the loop in this way.
For more information about StackOverflowError in Kettle see this JIRA issue(http://jira.pentaho.com/browse/PDI-1463). And more info about implementing this loop you can find in another article of Slawomir Chodnicki.
Using "Repeat" flag in Start step of the job.
I recommend this approach if you can't use both above. By checking 'Repeat' flag in Start step of a job you can easy make job running forever. More important question is how to stop it! Ok, The only way to do that in 'Out of the box' Kettle(even 4.0) is to use Abort step. It stops the job and writes an error message in log that job is finished with errors. But that may not be the case! I want my job to stop normal, successful, without errors. Why can't I do that? Why should I flood the log with ERROR messages when actually no errors occured? For that reason I implemented a simple plugin called 'Stop Job' that stops repeating job without writing error messages in log.
You can use this plugin in the same way you use Abort step. And you can add a message which will be written to log with log level BASIC.
Be aware that when using Repeat flag, the job will repeat to run over and over even when one of it steps fails. To stop the job in this case use Abort job step as depicted at the image below.
Here is a Stop Job plugin that you are free to use and modify.

The StopJob plugin having problem when run the job using Kettle.bat. Error message is "Unable to read Job Entry copy info from XML node : org.pentaho.di.core.exception.KettleStepLoaderException:
ReplyDeleteNo valid step/plugin specified (jobPlugin=null) for Stop Job"
Any fix ?
Thank You.
StopJob plugin works fine in version 4.1 but no in version 4.2. Sorry for lack of information at the previous post.
ReplyDeleteThank You.
Hi,
ReplyDeleteSo as this plugin is build based on Kettle's Abort step, you may just compare code changes to Abort step between 4.1 and 4.2 and them update StopJob plugin in the same way.