The basic problem with relational databases is that there are times when Data Mining is not scalable or time efficient. For example, there may be a database which is perfectly normalized and well suited for applications where data is constantly being inserted or updated.
A database design should be intuitive; thus allowing for queries that are easy to implement. Normalized Databases often store information in dozens or even hundreds of tables. This optimizes the performance for inserting and updating data. This is because a marginal amount of actual data is stored in each of those tables. Only a few tables are affected each time a SQL transaction is processed.
While all this may be true, once the database reaches a certain magnitude, querying the database becomes un-scalable. At that point, applications are developed to reduce the size of these databases which in turn combats retrieval times. As these relational databases grow in size, select queries take longer to execute. This is why indexing is an integral component of database design.
The more information stored in a well-designed relational database, there are more ways a user may want to gather data. A user may want to implement queries to glean different patterns or produce specialized reports. This is where OLAP and Cubes enter the picture.
Question: What is the difference between a cube and a square or rectangle?
Answer: Multidimensionality.
That is a long and confusing word. It reminds me of my favorite movie growing up – “Back To The Future”:
Doc:”Marty! You’re not thinking fourth-dimensionally!”
Marty:”Yeah, right, I have a real problem with that.”
Even programmers have that problem as well. That makes cubism seem daunting. It is for that reason cubism can be explained using good old fashioned 3-dimensionality. A Relational Database, like excel spreadsheets, consists of 2 dimensions, rows and columns. A cube is a square drawn 3-dimensionally. Thus, it is a good starting point to move beyond the scope of the conventional 2-dimensions.
Data mining architectures, on the other hand, are utilized for speed of data analysis. In order to pull off this task, data is stored in a Data Warehouse where data is de-normalized via a dimension-based model, a Cube.
To enhance the speed data retrieval, data is stored in a number of different ways in their most granular forms. These storage mechanisms are called Aggregates.
A cube is a data structure whose design allows for speedy analysis of information. Besides speed, cubes are capable of manipulating and analyzing data from many different perspectives, or Pivots. The way cubes can organize data overcomes the limitations of relational databases.
In this blog I plan on describing cubes in the context of OLAP (On Line Analytical Programming) databases; as well as how and when to use them. I will define all the terms needed to understand cubes by explaining:
• Dimensions
• Measures
• Members
• Hierarchies
• Data sources
• Pivots
• MDX
Once we have an OLAP Cube Database up and running, we will learn a pseudo-SQL like language designed to glean data from multiple pivots, called MDX.