PDI by itself is also a great consolidating tool. I won't praise it as much as BI server just to save your time, but when consider diversity of data sources and technologies Kettle is even more sociable then BI Server. Just have a look at Kettle plugin list to get it. I believe it can run data to earth from any data source you want.
And now imagine you tie these great tools together. You obtain fine-grained control over Kettle jobs. You can ask user for parameters before passing them to Kettle job. You can use Kettle job as a smart data source for your reports. Using java quartz lib you can schedule Kettle jobs to run with a second-precision (for comparison linux cron allows only minutes-precision).
But many people abandon this power and run Kettle as a separate java process. I saw a lot of unanswered question on forums about running Kettle jobs in BI Server. Up to Kettle version 4.0 some bugs was in the way of using BI Server - PDI couple tied together in enterprise environment. An example is a call of System.gc() in every Kettle job.
Below are some advices for those bold spirits who decide to integrate these tools.
Setting up Kettle repository
At first you need to show BI server where Kettle repository is. The easiest way to do it is to configure repository settings with Spoon tool. After that open User Home folder on you machine and find .kettle folder. You need to place .kettle to User Home folder of the machine where BI Server is running.
Then open file pentaho-solutions\system\kettle\settings.xml. Use following settings to connect to DB-based kettle repository:
1. <repository.type>rdbms</repository.type>
2. <repository.name>kettle_repo</repository.name>
3. <repository.userid>admin</repository.userid>
4. <repository.password>admin</repository.password>
admin/admin are default user name/password.
That's it.
Custom plugins.
If you have custom kettle plugins you need to put it in one of following places: <PATH>/plugins, <User Home>/.kettle/plugins, <pentaho-solutions>/system/kettle/plugins.
Running jobs
And now to run Kettle job just add an action like that to your action sequence:
<action-definition>For start use 'detail' value for kettle-logging-level. It writes enough logs but don't floods log files.
<component-name>KettleComponent</component-name>
<action-type>Pentaho Data Integration Job</action-type>
<action-inputs>
<kettle-logging-level type="string"/>
</action-inputs>
<action-resources/>
<component-definition>
<directory><![CDATA[job-dir]]></directory>
<job><![CDATA[job-name]]></job>
</component-definition>
</action-definition>
Hi Vasili,
ReplyDeleteI tried the way u mentioned here,i did changes to setting.xml file ,i got an issue saying repository not found.
error:- Kettle.ERROR_0004 - The repository specified was not found (Local)
Looking forward for you help to enable my BI server to find kettle repository settings.
Thanks,
Surya
Hi Surya,
ReplyDeleteI'll just give you the general ideas, for details please see the Pentaho forums. Like this one:
http://forums.pentaho.com/showthread.php?78458-PDI-and-database-repository-The-repository-specified-was-not-found
When you design transformations in Spoon, you are connecting to Kettle repository from Spoon, and saving jobs and transformations into this repository. While you defining repository connection in Spoon, it saves database information to file .kettle\repositories.xml in your user home directory. If you run BI Server on same machine, it will automatically find this settings, but it's you run on different machine, you need to copy this file there.
Actually there are many ways to run ETL in BI server. In this article I've described how to run transformations from kettle repository. But as an alternative way you may try to run transformation by simply putting it into pentaho solutions. Check this sample in BI Server for details pentaho-solutions\bi-developers\etl\SampleTransformation.xaction
Hi Vasili,
ReplyDeleteI am trying to run the job which is stored in my local repository using an xaction. I have made the changes in settings.xml and yes i have a repositories.xml file in my $user/.kettle directory but when i am trying to run the xaction i am really having a situation here. the error is "The repository specified not found(myrepo)". I am running the xaction on the same machine but still not able to run it. I really want to run a job from my repository using action sequence. Hope one more pair of eyes could help me with it.
Thanks,
Santosh
Hi Santosh and Surya,
ReplyDeleteAre you working together or it's a different issues?
Try this checklist:
In Spoon menu, go to Tools -> Repository -> Connect. Choose the repository you have created. Enter user name and password.
1. Verify that in settings.xml you entered same values to repository.name, repository.userid and repository.password.
2. Connect to repository. Verify that you able to connect to it.
Now, go to Tools -> Repository -> Explore. Find the job you need to run.
3. Verify that in xaction you have entered same directory and job names. On my local environment I've put that job in a folder, so I'm not sure if running from root folder will work.
4. Open repositories.xml in text or xml editor, and verify that it's have description of yours repository.
If that's not helps - create a question on pentaho forums and attach all mentioned files to it (repositories.xml, settings.xml, action sequence ). And post the link to question here. Then we can continue discussion on pentaho forums. Because Pentaho forums much more suitable for such discussions.
Hi vasili,
ReplyDeletenot sure who Santosh is...? , but issue seems to be same..!!
Q:-which version of Kettle and BI Server is used
Ans:- pentaho 4.5.0 GA (4.5.0 GA Suite Release)
Q :- which database
ans:- mysql
Q:- show the content of repositories.xml
ans
PentahoEnterpriseRepository
Local
Local
http://localhost:9080/pentaho-di
N
Q:- action sequence file.
Ans :-
KettleComponent
Pentaho Data Integration Job
Q:- 1. Verify that in settings.xml you entered same values to repository.name, repository.userid and repository.password.
ans :- yes , repository.name, repository.userid and repository.password values are same and correct , i can connect to spoon using specified credentials .
q:- Now, go to Tools -> Repository -> Explore. Find the job you need to run.
Yes i can find my job through explore (spoon GUI)
Q:- Open repositories.xml in text or xml editor, and verify that it's have description of yours repository.
yes my repositories.xml file has repository name
While you defining repository connection in Spoon, it saves database information to file .kettle\repositories.xml in your user home directory from your blog
what version of pentaho are you using.?
we have 2 different file kettle folder .
1) kettle.properties this save DB connection information
2) repository.xml file this save repository details,URL .
thanks for your help,
Surya
Santosh, you may also join the discussion on pentaho forums:
ReplyDeletehttp://forums.pentaho.com/showthread.php?143338-The-repository-specified-was-not-found
Hi Vasili ,
ReplyDeleteneed your suggestion/idea/comment in below topic .
I can execute a independent scheduled transformation from pentaho DI server console .But issue on running a parametrized scheduled transformation from pentaho DI server console .How can i pass parameter value at run time .
In pentaho BI server , to execute parametrized report we used to pass variable value in URL . tried same in pentho DI server as below
http://localhost:9080/pentaho-di/kettle/startTrans?name=UI_parameter&Values=Testvalue
any suggestion would be a gr8 help
ReplyDeleteDI server is a part of Pentaho Enterprise Edition. I don't use it, so can't help you. Try asking your question on Pentaho forums, or search documentation of enterprise edition.
ReplyDeleteMy bad ,thanks for the inputs ...!!!
Delete