Implementing data cubes efficiently | RDL Research Database

Abstract

Decision support applications involve complex queries on very large databases. Since response times should be small, query optimization is critical. Users typically view the data as multidimensional data cubes. Each cell of the data cube is a view consisting of an aggregation of interest, like total sales. The values of many of these cells are dependent on the values of other cells in the data cube. A common and powerful query optimization technique is to materialize some or all of these cells rather than compute them from raw data each time. Commercial systems differ mainly in their approach to materializing the data cube. In this paper, we investigate the issue of which cells (views) to materialize when it is too expensive to materialize all views. A lattice framework is used to express dependencies among views. We present greedy algorithms that work off this lattice and determine a good set of views to materialize. The greedy algorithm performs within a small constant factor of optimal under a variety of models. We then consider the most common case of the hypercube lattice and examine the choice of materialized views for hypercubes in detail, giving some good tradeoffs between the space used and the average time to answer a query.

Keywords

HypercubeComputer scienceData cubeGreedy algorithmData warehouseMaterialized viewOnline analytical processingCube (algebra)Data structureTheoretical computer scienceData miningAlgorithmMathematicsCombinatoricsViewDatabase designParallel computing

Affiliated Institutions

Stanford University US

Related Publications

A learning theory approach to noninteractive database privacy

Avrim Blum , Katrina Ligett , Aaron Roth

In this article, we demonstrate that, ignoring computational constraints, it is possible to release synthetic databases that are useful for accurately answering large classes of...

2013 Journal of the ACM 255 citations

Metarule-Guided Mining of Multi-Dimensional Association RulesUsing Data Cubes

Micheline Kamber , Jiawei Han , Jenny Y. Chiang

In this paper, we employ a novel approach to metarule-guided, multi-dimensional association rule mining which explores a data cube structure. We propose algorithms for metarule-...

1999 209 citations

The Grid file: A data structure designed to support proximity queries on spatial objects

Klaus Hinrichs , J. Nievergelt

Abstract : This document describes a technique for storing large sets of spatial objects so that proximity queries are handled efficiently as part of the accessing mechanism. Th...

1983 Repository for Publications and Resea... 41 citations

Knowledge Discovery in Databases: An Attribute-Oriented Approach

Jiawei Han , Yandong Cai , Nick Cercone

Knowledge discovery in databases, or data mining, is an important issue in the development of data- and knowledge-base systems. An attribute-oriented induction method has been d...

1992 385 citations

Set-oriented mining for association rules in relational databases

M.A.W. Houtsma , A. Swami

Describe set-oriented algorithms for mining association rules. Such algorithms imply performing multiple joins and may appear to be inherently less efficient than special-purpos...

2002 270 citations

Publication Info

Year: 1996
Type: article
Pages: 205-216
Citations: 1161
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Implementing data cubes efficiently

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

1161

OpenAlex

Cite This

APA Style

                            
                                    Venky Harinarayan, 
                                
                                    Anand Rajaraman, 
                                
                                    Jeffrey D. Ullman
                                
                            (1996). 
                            Implementing data cubes efficiently. 
                            
                            , 205-216.
                            https://doi.org/10.1145/233269.233333

Identifiers

DOI: 10.1145/233269.233333