Saturday, February 4, 2012

OLAP tools review: Introduction


I'm impressed. And this impression finally pushed me to start these series of articles about Data Analysis Tools. I'm going to review and compare several OLAP Systems. Actually the compared set of tools is now limited to java web application which I had a chance to work with as a Buisness Intelligence consultant. They are: JPivot, Jaspersoft OLAP, Pentaho Analysis and may be I'll add something in future.

So what did impress me? Today it was a Pentaho Analyzer Report (Pentaho Analysis) which is a part of Pentaho BI Server Enterprise Edition. I really like how its usability was improved. It's not an advertisement article, so don't stop reading it after a praise to Pentaho, in the review I'll try to look deeper to functionality of each tool and as well into some technical details and I'll try to be unbiased. All in all most of these tools have very similar functionality. And also I want to find directions for these tools to evolve.


The purpose of OLAP systems.

Let's look at OLAP from different sides. At first let's look at OLAP as at Decision Support Systems (DSS). I'm not going to give a strict definition, but essentially what Decision Support Systems used for? To help people make decisions. How do they help? Decision Support System provides information required by the person for making a decision. In this sense Google Search is the best Decision Support System, it brings you the information required for making decisions. But Google Search can't help us calculate company's expenses for last month. Then let's refine our definition for OLAP. OLAP are business-oriented systems which work with figures. 

By classification of Daniel Power we deal with data-driven DSS. A data-driven DSS provide operational and strategic business intelligence using internal company data and sometimes external data. You may read more about it in Daniel's article, but we go further.

Also I want to compare DSS to Expert Systems. Decision Support Systems provides the information required by an expert for making a decision. In contrast to them, Expert Systems make decisions themselves and provide explanations to an expert. The point of intersection between them is an understanding how an expert explores the information in the process of making a decision. An understanding of expert's needs is a key to improve existing DSS systems.

To summarize, we talk here about OLAP systems which are kind of data-driven Decision Support Systems.

OLAP and Data Warehouse intuition

Let me suggest you an intuition behind OLAP Systems.

Suppose you earn a lot of money but you have found out that they all get spent somewhere. What will be the solution? I would suggest making notes of your expenses. If you open an Excel sheet and create a table, like one shown below, and will start filling it every day, you may become proud that you have build your first Data Warehouse.


Date of purchase Category Name of product or service    Cost
1 Jan 2010 food cheese $5.45
1 Jan 2010 food milk $3.30
2 Jan 2010 entertainment visit to theater $160.00
3 Jan 2010 car gasoline $50.00

Now after a month of doing these records you may calculate many useful aggregate values like: total expenses for this month, total expenses for each of categories, total expenses for gasoline. This is essentially what OLAP systems do for you. They calculate aggregated values filtered by different criteria.

That is to say you'll be able to ask OLAP about sums of these expenses or about lists of individual expense records. Like: "How much money was spent this month?", "How much money was spent last month for a car?", "Show me the bar chart with comparison of expenses in each month of this year." As well as much more complex questions.

Note that you always aggregate values from cost column. Such columns as cost are called Measure columns. There is no reason to sum values from other columns in our table. As example there is no sense to sum product names. Other columns are used to group or to filter costs: by month, by year, by product category. These grouping and filtering columns are called Dimensions (and sometimes they called Filters).

It's important for the user of OLAP system to have a clear understanding what is represented by individual line of a Data Warehouse table. In our case it's a single purchased item details. But when you look at these data from OLAP, you usually see sums of these items like total cost for some period of time. Usually the easiest task in OLAP is to see a grand total over all items, and a little more time consuming is to navigate to detailed subset of these items.


Dimension structure

OLAP system usually deals with large volumes of data. And the diversity of Dimension values can grow large. To help users navigate data, OLAP system allows structuring Dimension values into a tree form. The most popular example is a Date Dimension, which can have branches for each year, then each year will have a branch for quarters, each quarter branches for month, months will have leaves for days.


In the next article I'll try to build a basis for reviewing and comparison of OLAP tools.


4 comments: