OLAP system in JAVA

Posted by | · · · · | Software developement | No Comments on OLAP system in JAVA

What is OLAP?

The abbreviation OLAP stands for On-Line Analitical Processing.
As Java programmers, we do not encounter OLAP systems very frequently. When one is developing a Java EE application, it is more probable that they are building an OLTP system. (On-Line Transaction Processing) There is no reason to be upset if you do not completely know each of these abbreviations! OLAP/OLTP systems have serious literature, thus here I will only summarize their characteristics in a nutshell.

OLAP vs OLTP

OLTP OLAP
Purpose Business logic and processes Decision support, management information, statistics
System Operational System Data Warehouse
Operations CRUD(INSERT,UPDATE,DELETE) operations Queries
Model Business entities in relational database Multidimensional cubes (star scheme)
Normalizedness Normalized (usually 3NF relational database) Denormalized data structure, redundant data storage
Data Rapidly changing data (data integrity is important)) Aggregated and historical data in multi-level division (e.g. year – quarter – month)
Performance Several short transactions (tr per time unit) Complex queries that might run for a long period, rare uploads, with scheduled background processes

olap4j, Mondrian

During one of our projects the idea emerged that we should use OLAP for statistical queries. As it was a Java project, we mainly looked for Java-based OLAP solutions. This is how we found olap4j which is a Java API for OLAP. The API – not accidentally – is very similar to JDBC, and is also based on it, so it was relatively easy and fast to create the first working code within it. To run a query for instance, we similarly need a connection, then a statement from it, and through implementing it we receive the result of the query wrapped in some form of resultset.

olap4j vs JDBC

JDBC olap4j
java.sql.Connection org.olap4j.OlapConnection
java.sql.Statement org.olap4j.OlapStatement
java.sql.ResultSet org.olap4j.CellSet

From olap4j we reached the Mondrian which is an OLAP engine and also the reference implementation of olap4j. The Mondrian is a ROLAP (Relational OLAP) system. It means that the multidimensional data model is implemented in a relational database. In ROLAP in the simplest case a cube (star schema) is created as the fact table refers to several dimension tables. In more complicated cases the fact table and the dimensions might also be divided into several tables (snowflake schema). It must be mentioned that the task of Mondrian here is to manage the multidimensional data model, and to interpret and run queries, but not to fill the data structure corresponding to the model (fact tables and dimension tables) with data. The filling of multidimensional cubes from a source system is called ETL (Extract, Transform, Load) procedure. Thus, we carried this out completely independently from the Mondrian, with Talend. The ETL procedure is always unique, it depends on the structure and access of the source systems, and naturally on the multidimensional data model we intend to build, thus I will write no more about it here.

MDX query language

The MDX (MultiDimensional eXpression) query language was developed for OLAP systems. The language is highly similar to SQL, but while the latter works with relational data tables, the MDX uses the elements of the multidimensional data model: cube, dimension, fact data. Similarly to SQL, MDX also has dialects. Documentation about the MDX dialect of Mondrian can be found here.

Multidimensional xml schema

Mondrian stores the multidimensional data model in an XML schema. Cubes, Dimensions and the relational database structure corresponding to the model can be defined in the XML schema. When creating an OlapConnection, apart from the access of the database, the location of the XML schema must also be provided in the connection string. Mondrian constructs the multidimensional data model in the memory on the basis of the above. The Mondrian documentation contains several references to the FoodMart exemplar data model. This is a fictional multidimensional data model created for the operation of a supermarket chain. For instance, the Sales cube stores the sales data of the supermarket chain sorted according to several dimensions (product, time, store, the gender or qualifications of customers, etc.).

The definition of dimension in the model

In the XML schema a dimension can be defined with the Dimension element. The following example shows the (Store) dimension of the FoodMart model supermarket. The dimensions are hierarchically structured. The store dimension for instance includes country/state/city/given store levels.

    <Dimension name="Store">
        <Hierarchy hasAll="true" primaryKey="store_id">
            <Table name="store"/>
            <Level name="Store Country" column="store_country" uniqueMembers="true"/>
            <Level name="Store State" column="store_state" uniqueMembers="true"/>
            <Level name="Store City" column="store_city" uniqueMembers="false"/>
            <Level name="Store Name" column="store_name" uniqueMembers="true"/>
        </Hierarchy>
    </Dimension>

The definition of the cube

In the FoodMart example, the Sales cube uses the Store dimension. In the XML schema we can refer to the globally created dimension with the DimensionUsage element. With the Dimension element we can also create a local dimension (which is only visible in the cube). To achieve this, the Dimension must be placed inside the given Cube element.

<Cube name="Sales" defaultMeasure="Unit Sales">
   <!-- fact table -->
   <Table name="sales_fact"/>

   <!-- using globally defined Store dimension  -->
   <DimensionUsage name="Store" source="Store" foreignKey="store_id"/>

   <!-- other dimensions, cube level definitions, and referencies -->

   <!-- at least one fact data in the space created by dimensions -->
   <Measure name="Unit Sales" column="unit_sales" aggregator="sum"
           formatString="Standard"/>
</Cube>   

The use of the OlapConnection and the OLAP Schema használata

In the code section below the JDBC-based connection management and the importation of the data model schema can be seen. The Schema object is constructed on the basis of the XML schema provided in the connectionString.

import java.sql.Connection;
import java.sql.DriverManager;
import org.olap4j.OlapConnection;
import org.olap4j.OlapWrapper;
import org.olap4j.metadata.Schema;

Connection connection = DriverManager.getConnection(connectionString);
OlapWrapper wrapper = (OlapWrapper) connection;
OlapConnection olapConnection = wrapper.unwrap(OlapConnection.class);
Schema schema = olapConnection.getOlapSchema();

Entering cubes and dimensions

Through the Schema object we can enter into the elements of the multidimensional model: the cubes, dimensions, etc.

import org.olap4j.metadata.Cube;
import org.olap4j.metadata.Dimension;
import org.olap4j.metadata.Hierarchy;
import org.olap4j.metadata.Level;
import org.olap4j.metadata.Schema;

Cube cube = schema.getCubes().get("Sales");
Dimension dimension = cube.getDimensions().get("Store");
Hierarchy hierarchy = dimension.getDefaultHierarchy();
Level level = hierarchy.getLevels().get("Store City");

Running MDX queries

MDX queries can be compiled in a textual form, or by using the elements of the model dynamically, in a typesafe mode. (For the latter, see the ParseTreeNode interface and others in the org.olap4j.mdx package).

import org.olap4j.CellSet;
import org.olap4j.OlapConnection;
import org.olap4j.OlapException;
import org.olap4j.OlapStatement;

String mdx = "SELECT {[Time].[Quarter]} ON COLUMNS,"
                + "  {[Product].[Product Family]} ON ROWS"
                + " FROM [Sales]";
OlapStatement statement = olapConnection.createStatement();
CellSet cellSet = statement.executeOlapQuery(mdx);

Thus, we receive the result of the query on a CellSet interface. The required data will be available if we enter into the CellSet (along Axises, obtaining information from Cells).

SAIKU

In the course of the development we discovered SAIKU, which is a Mondrian-based OLAP application. SAIKU was very useful for us as it helped us become familiar with OLAP and Mondrian. We can construct the query on its web interface; moreover it presents the MDX query, which means a huge help in learning about the MDX language. An Online demo is available for SAIKU. The web interface of SAIKU is the following:

SAIKU Demo

Resources

OLTP vs OLAP
Adattárházak
OLAP vs OLTP: What makes the difference?
olap4j
Mondrian
FoodMart séma
SAIKU
SAIKU Online Demo


No Comments

Leave a comment