Abstract
In recent years, there has been a proliferation of database systems in all types of organizations. In many cases, these databases are developed in different departments and maintained autonomously. Much is to be gained, however, if databases across departments, divisions, or even organizations can be related to one another. One main problem of relating data stored in different databases is the differences in their representation of real-world entities, such as the use of different identifiers or primary keys. We present a decision theoretic model for matching entities across different databases. The decision to match two entities from two different databases inherently involves some uncertainty since an exact match may not be found because of errors in data collection, data entry, and data representation. We model this uncertainty using probability theory and propose an integer programming formulation that minimizes the total cost associated with the entity matching decision. The model has been implemented and validated on real-world data.
Keywords
Affiliated Institutions
Related Publications
Retrieval of Crystallographically-Derived Molecular Geometry Information
The crystallographically determined bond length, valence angle, and torsion angle information in the Cambridge Structural Database (CSD) has many uses. However, accessing it by ...
An overview of data warehousing and OLAP technology
Data warehousing and on-line analytical processing (OLAP) are essential elements of decision support, which has increasingly become a focus of the database industry. Many commer...
Separating key management from file system security
No secure network file system has ever grown to span the Internet. Existing systems all lack adequate key management for security at a global scale. Given the diversity of the I...
Effectiveness of information retrieval methods
Abstract Results of some 50 different retrieval methods applied in three experimental retrieval systems were subjected to the analysis suggested by statistical decision theory. ...
The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling
Abstract Motivation: Homology models of proteins are of great interest for planning and analysing biological experiments when no experimental three-dimensional structures are av...
Publication Info
- Year
- 1998
- Type
- article
- Volume
- 44
- Issue
- 10
- Pages
- 1379-1395
- Citations
- 63
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1287/mnsc.44.10.1379