Context-based Data Quality Metrics in Data Warehouse Systems

The fact that Data Quality (DQ) depends on the context, in which data are produced, stored and used, is widely recognized in the research community. Data Warehouse Systems (DWS), whose main goal is to give support to decision making based on data, have had a huge growth in the last years, in research and industry. DQ in this kind of systems becomes essential. This work presents a proposal for identifying DQ problems in the domain of DWS, considering the different contexts that exist in each system component. This proposal may act as a first conceptual framework that guides the DQ-responsible in the management of DQ in DWS. The main contributions of this work are a thorough literature review about how contexts are used for evaluating DQ in DWS, and a proposal for assessing DQ in DWS through context-based DQ metrics.


Introduction
Data Quality (DQ) is a wide research area, which includes different aspects, issues and challenges. Besides, it is extremely relevant for industry due to its impact in information systems of all application domains. DQ refers to a set of characteristics that data should have, such as accuracy, currency, completeness, etc. [1]. The consequences of low quality in data are often experimented in daily life, for example when a wrong address or duplicate customer information generate problems with customer relationship. On the other hand, when considering multiple and heterogeneous sources that must be integrated, DQ of the sources is usually unknown and must be evaluated and improved as part of the integration process [2].
The fact that DQ depends on the context in which data are produced, stored and used, is widely recognized in the research community [3]. Following this idea, the fitness for use approach, firstly presented in [4], is also a widely adopted concept, where quality is considered as totally dependent on the adequacy of data to their intended use.
The Context research area is also very wide and has many different applications. Many authors have already identified the importance that context has for data, for example in [5]. For instance, in Pervasive Computing domain, where applications operate in highly dynamic environments, context is necessary for selecting data relevant to the user, as these data should be adequate to the user environment, and this environment determines the context [6].
On the other hand, the Data Warehouse research area had an enormous development during the last two decades, in particular the topics of design, loading and maintenance, and OLAP (On-Line Analytical Processing). At the same time, there was a significant grew in the use of these systems in industry and a proliferation of tools for their implementation and exploitation. Moreover, in the last years, the increasing volume of available data and the importance data have gained, have contributed to the relevance of the Data Warehouse Systems (DWS), whose main goal is to give support to decision making based on data. DQ in this kind of systems becomes essential. It can be addressed from two points of view: (i) the final user should be aware of the quality of the data he is receiving, and (ii) the system should be able to control and improve the quality of the data it receives, processes and provides.
This work presents a proposal for identifying DQ problems in the domain of DWS, considering the different contexts that exist in each system component. This proposal may act as a first conceptual framework that guides the DQ-responsible in the management of DQ in DWS.
The main contributions of this work are the following: (i) a literature review identifying and analyzing the proposals that relate the topics of DW, DQ and Contexts, (ii) the definition of contexts in DWS and DQ metrics based on these contexts, and (iii) a case study showing the application of the proposal. This paper is an extension of [7], where we presented a reduced literature review and one metric for each kind of DQ defined. In this paper we present a wider literature review where we consider many more works and additionally, we present some relevant results summarized into tables. On the other hand, the present work adds more DQ metrics that show different strategies for measuring DQ with context-based metrics.
The rest of the paper is organized as follows: Section 2 introduces the three research areas involved in this work, Section 3 presents a literature review, Section 4 presents an approach for assessing DQ taking contexts into account, Section 5 presents a Case Study where a set of DQ metrics are defined, and finally, Section 6 presents the conclusions and future work.

Data Quality, Data Warehouse Systems and Contexts
This section presents the main concepts of the research areas over which this work is built: Data Quality, Data Warehouse and Contexts.
The impact that low DQ has over the organizations and the huge influence it has over all kinds of information systems in all kinds of application domains is a widely recognized problem. However, the meaning of data quality is not always understood and there are still in the literature different definitions for it. In [8] DQ is defined as the capacity of satisfying the requirements for the use of data. According to the authors, DQ depends on the intended use of the data as well as on the data itself. In [9] DQ is defined as the fitness for use of the data. This concept is widely adopted in the DQ related bibliography, such as [1][3][10] [11]. In [1] the authors present an analysis of the different definitions of DQ that were proposed since the 90's, and they remark that a definition including the main characteristics considered by most of the existing proposals is needed. They conclude that DQ refers to a set of quality dimensions, which in general are defined as quality properties or characteristics, and which are grouped in four categories: accuracy, currency, completeness and consistency. In addition, they state that these are the dimensions more often associated to DQ. However, they say that there is not an agreement in the literature about the set of dimensions that characterize DQ or the meanings of them, and a standard has not yet been established.
Among all the existing kinds of decision support systems, DWS are the ones over which academic and industrial communities had paid more attention [12]. In [13], Inmon gives the following definition: "A DW is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process". The data offered by a DW are extracted from various sources, and then integrated, cleaned and transformed in order to be used at the moment of decision making [12] [13]. One big challenge of this kind of system is to achieve the integration of multiple heterogeneous data sources.
DW are based on the multidimensional model [14]. This model allows a better understanding of data and leads to better performance in complex queries. In multidimensional models data are presented in an n-dimensional space, generally known as data cube or hypercube, where the axes are the dimensions of analysis and the points in the space contains the indicators, called measures. Each coordinate of the cube is called a fact. The dimensions are structured in hierarchies that give the criteria for data aggregation, which is an essential operation in multidimensional analysis. Besides, the hierarchies are composed by levels, and the instances of a level are called members. For example, in the dimension Time there may be a hierarchy composed by levels date, month, quarter, year. 'Q1', 'Q2', 'Q3', 'Q4' are members of the level quarter. Fig. 1 shows an example of a data cube [14]. A DWS is an information system whose main component is a DW, and whose general architecture is composed by different components that make possible the whole process from the data sources towards the enduser. In [12] different architectures are presented, but the two-tired architecture, shown in Fig. 2, is stated as the most referenced one. It remarks the separation between the sources and the DW, although it represents the four main stages in the data flow: • Sources: They contain the raw material of a DWS and they may be internal to the organization (operational systems, e-documents, etc.) or external to it (web data, other organization's data, etc.). • Data staging: ETL (Extraction, Transformation and Loading) processes are used for loading the data obtained from the sources into the DW. The data staging is a temporal storage for extracted data, over which all data integration and transformation processes are executed before data is loaded into the DW. • Data Warehouse: The DW includes all corporate data, while a Data Mart (DM) contains a data subset or aggregated data from the DW and it is designed for a specific organization sector, i.e. for a specific analysis domain. The metadata contains information about the DW data, such as sources information, access procedures, users information, data provenance, etc. [12]. • Analysis layer: The OLAP server provides the needed support for multidimensional data organization and multidimensional data manipulation, allowing the execution of multidimensional operations over data [15]. There are a variety of possible client tools, which allow the users to do interactive analysis, such as OLAP, data mining, and specific analysis tasks. As said before, the importance of considering the context in DQ management is widely recognized. However, it is not possible to find in the literature a concise and globally accepted definition for the concept of context. In fact, there are a lot of conceptualizations and definitions for it, whose approaches depend on the research domain where it is applied. For example, in [16], the context is defined as the possibility of selecting data according to the user environment, while in [5] the authors consider that the context is a set of variables of interest that influence the actions of an agent. In [17] a set of definitions extracted from the Web are analyzed. The authors consider that is difficult to find a relevant definition that satisfies all disciplines and they mention that there are still few ideas about the relevant properties that should be considered when modeling context.

How DQ is managed in DWS?
DQ in DWS is a very much studied issue. Many works are focused on solving DQ problems in the stage of data sources identification and selection, and in the ETL stage. However, the existing work shows that it is not clear yet which is the best stage for addressing DQ assessment in this kind of systems.
The work presented in [18] focuses on measuring believability of data. It states that believability depends on the provenance of data, which refers to the sources of the DWS. In [19] the authors present DQ as a challenge in DW design and implementation. They remark the importance of quality and consistency checking in the data that comes from the sources. In this work data consistency is presented as a separate concept from DQ, differently from most works that present consistency as a particular aspect of DQ.
In [20] the authors address the quality process as part of the ETL tasks, presenting a proposal for DQ control and data cleaning. They propose rules for data transformation, which do the data cleaning, and a rule generator that facilitates the inclusion of DQ processes into the ETL tasks. In [21] they also focus on data cleaning in the ETL process, proposing a framework of configuration rules, which has two inputs: a data profiling used for selecting the data sources to be cleaned, and user opinions about already known data problems. In the same direction, the authors of [22] state that the ETL data cleaning tasks should identify the data problems and inconsistencies and perform the necessary transformations for assuring a minimum DQ for the data that will be loaded in the DW.
With respect to the data analysis stage, in [23] the authors claim that DQ is subjective from two perspectives: on one hand, DQ problems may be more or less relevant depending on the decisions to be made by the data user, and on the other hand, data analysts may have previous personal knowledge and opinions about the quality of the received data. In [24] the authors emphasize the influence of the quality of data management in business intelligence (BI) systems throughout the whole process form the sources to the decision making stage. In [25] the impact that DQ has on decision making, organizations believability and customer satisfaction is remarked. They consider that DQ problems in multidimensional repositories still need to be correctly organized and addressed.
In summary, the importance of DQ in DWS has been widely shown in the existing literature, since many researchers present the challenge of managing DQ in these systems. However, there is not consensus about how this task should be done. Most of the works have only addressed the problem of data cleaning in the ETL stage, ignoring the task of DQ assessment throughout the whole DWS lifecycle (or DWS components). In addition, although many works have made efforts on identifying DQ dimensions for DWs, a consensual set of DQ dimensions has not been identified yet, questioning the possibility of defining a specific and unique set of DQ dimensions for DW environments.

How Contexts are defined and used in DWS?
DWS are determined by the dimensions and measures that describe the business. Based on this, some authors consider the context as the dimensions that allow the analysis of the measures. In particular, in [19], the authors present the DW facts as the numerical data needed for satisfying all calculation options that are of interest to the end user, and they say that dimensional tables provide context to these facts. Considering this concept, the stage of requirements elicitation and conceptual analysis in the DWS design, has great influence on the context that will be given to the DW measures by the DW dimensions.
Some works focus on the stage of source identification and selection [26][27][28] [29], stating that organizational data coming from traditional DW sources, i.e. operational data, can be more successfully exploited if they are combined with data external to the DW. In [28] [29] the authors consider that the context is defined by documents content. According to [28], the most common information sources for users are the intranet of their organization, the web, and the emails. These unstructured data are often related with the DW entities and relationships, constituting the context. Meanwhile, in [30] the authors consider the context in the ETL stage, grouping the users that share interests in this stage and contextualizing the ETL process according to the different points of views of the experts.
In the data analysis stage the use of contexts is also emphasized. In [29] they claim that the context that allows the understanding of the data stored in the DW is usually present in separate documents, whose value is not correctly exploited. These documents provide information about the DW facts, describing their context. In [31], with another approach for the concept of context, the authors propose the use of rules for representing knowledge as a mean for taking context into account during OLAP. In [28][29] the authors state that the OLAP tasks determine the context of the decision making process, and this context is composed by all the information involved by these tasks. They claim that contextual information must be taken into account during DW exploitation. However, although several context-sensitive applications have been proposed, there is not enough research in the integration of context to DWS yet.
Finally, we also studied other works from which we obtained relevant results, but we do not detail here [32] [43].

How Contexts are considered in DQ assessment?
The concept of fitness for use is widely used in DQ literature. Several authors consider that this concept means that DQ depends on the context where data are used. The works presented in [21][44] [45] [46][47] [48] are based on or remark this approach. In [48] they explain that the information-consumers' needs and the circumstances around the use of the information should be taken into account. In [46] the authors mention that most of the existing approaches depend on the context, however the contextual dimension is barely represented in DQ frameworks.
The work in [49] shows that besides users may have different data requirements, their satisfactory quality level and the elements that define their context vary according to each-one's perspective. In this case, the context is determined by the users' requirements. Also in [50] the authors focus on users and their needs. In this case, the contextualized object are the queries made by users, and the context for this object is determined for users information (geo-location, habits, interests and needs), and queries data and processing quality.
In [51] the authors explore the contextual nature of the information and they use the term Context referring to a group of users with a specific purpose in a specific task. They consider that information quality should be evaluated taking the information construction process into account, not using a static measure. They propose to consider the states from data to knowledge, going through the different contexts. Additionally, in [48] DQ and information quality are considered different. In particular, for the authors, information quality metrics are necessarily contextdependent while DQ metrics may be absolute.
The work in [47], like many works found in the literature, is based on the DQ dimensions classification given in [3], where DQ dimensions are classified into: intrinsic (based on the degree to which data values adjust to real values), contextual (based on the degree to which data are applicable to user's task), representational (based on the degree to which data are presented in an intelligible and clear way) and accessibility (based on the degree to which data are available). For example, accuracy and objectivity are objective dimensions since they are intrinsic to the data and independent from the context where they are used. However, not all DQ dimensions can be objectively measured, since dimensions such as relevance and believability tend to vary according to the context where they are used. In [47] the authors claim that due to the dependence of DQ on the context, despite the wide discussions about DQ dimensions existing in the literature, it does not exist a unified set of DQ dimensions.
Finally, we also studied other works from which we obtained relevant results, but we do not detail here [ [57].

How contexts may be used for assessing DQ in DW?
Concerning the main question presented at the beginning of this section, only a few works are found, which intend to answer it.
In [45] a proposal where DQ is integrated to the whole DW development process is presented. They mention that a DQ dimension may involve objective information about data and their transformation process, but they also may be subjective, since different users may have different quality requirements. They also refer to the concept of fitness for use for describing DQ, highlighting the relativeness of quality and the fact that certain data may be suitable for certain use and not for another use. This work shows the importance of considering context, although they do not use this term, when evaluating DQ in DW.
In [44] the authors present a proposal for managing the data cleaning process in the ETL stage. They define DQ in DW as the degree to which data satisfy the specific user needs, and as users' interests vary, they state that DQ needs to be adaptable. They implicitly consider user context for evaluating DQ. For the authors of [58], DQ is context-sensitive by nature and must be evaluated in the context of the business domain where data will be used. Additionally, they consider that DQ evaluation research has still much to do, as a relevant set of DQ factors for decision support systems has not yet been identified and the context has not been considered for DQ measurement in these systems. In these two works, the authors mention that additional factors should be considered in decision support systems, such as the role of the decision makers, their preferences and interests, and the hierarchies of the decisions. The authors remark the importance of user and decision contexts for evaluating DQ.
Finally, we also studied other works from which we obtained relevant results, but we do not detail here [59][60][61] [62].

Relevant Results
In this section we present other results that are considered of interest to the general problem of the evaluation of DQ taking into account the context in DWS.
In Table 1 we show a classification of articles according to the definition of context that different authors present in the literature. The table shows for whom or what the context is defined and how this context is determined. Hence, the column Object (contextualized object) corresponds to the concept of for whom or for what the context is defined. In the column Context (contextualizing object) we show what defines the context and the column Context Components describes how such context (or contextualizing object) is composed. Data cube information, user preferences and aggregation function information. [31] Data Data's dimensional aspects, relations between hierarchical categories, dimensional rules and constraints. [59] Environment Information of the environment, the user and the task that will be performed. [46] Decision making Decision categories (i.e. credit, tactical, etc.), weight of DQ factor over the decision category and the user confidence for each DQ factor. [58] User requirements. [52] Data or Information Quality

Users
The purpose of users in a specific task.
[47] [51] [53] [63] User and Request specificities User information (geo-location, habits, interests and needs). Similar queries. Data and processing quality. [50] Queries Users Features and user preferences. [32] Environment Source identification (sensor), identification of the entity for which the sensor picks up data, time in which data is collected and data. [54] Data quality, Data and Sensor Features (CPU, memory, energy, etc.) and sensor measures, user preferences, task information to be performed. [55] Data Quality Requirements Constraints defined by rules. [56] Multidim.

Recommen dation System
Users User profile and his preferences [60] Data Sources Sources structure All data sources with similar structure (columns, records, etc. User Documents Document content. [43] We can see, In Table I, that there is a diversity of components for defining the contextualizing object. This is because the contextualizing object is not unique and it depends on the contextualized object. For example, in [52] the contextualized object is data quality, the contextualizing object is the user and it is determined by the user requirements. On the other hand, in [50] the authors consider that the large amount of information generated by users and applications is often underused. Meanwhile, the information retrieved from the data sources is often irrelevant to the user needs. For this reason, the processing of complex requests on such data sources is costly and does not guarantee the user satisfaction. For all this, the authors propose to take into account the context of the queries submitted to the different data sources. The works presented in [54][55] are focused on sensed contexts, concept that is used in ubiquitous (or pervasive) computing environments. In this kind of environment, where data sources are sensors, taking into account the context is of great importance. Also in [57] the authors focus on an environment of multiple sources, and they consider that the same entity is stored redundantly in different data sources. Therefore, authors consider that entities with the same structure represent the same entity, and such structure (or syntax, according to the authors), determines the context of data sources. On the other hand, in [60] the contextual information is integrated into a multidimensional recommendation environment, to be used in OLAP systems.
Finally, an interesting approach, shared by various authors, is the one presented in the works that define the context of the DW measures as the corresponding DW dimensions [19] [61]. DW measures are data and therefore these articles could be grouped with articles that consider data contexts. However, it is important to consider these works separately because they present a specific approach. Moreover, in [29] authors also focus on the DW measures context. However, in this case, the context is determined by the relevant documents of the organization and its contents.
We extract from Table I the most important contextualized objects for our work. In Table 2 these objects and the quantity of analyzed articles that focus on them are listed. On the other hand, in Table 3, it is interesting to spotlight the used contextualizing objects. In this case, we also show the most important objects for this research. It is interesting to note how objects that are contextualized in some cases, in other cases are contextualizing objects. An example of this is observed for the following objects: Users, Quality of data/information and Data. We can see that the majority of the articles analyzed intend to give context to the DW measures and the Data/Information quality, while the DW dimensions and Users are most frequently used by researchers to give context to different objects.

DQ Assessment in DWS based on Contexts
The conclusions posed by the authors of the analyzed bibliography are our starting point to develop a proposal that defines a framework where data quality problems can be identified in Data Warehouse Systems. Besides, it is interesting to consider this framework as the basis for quality management. For this, we take into account the different components that are part of this kind of systems (shown in Fig. 2) and the different contexts that have influence over them.
One of the purposes of this work is to define contexts, present throughout all lifecycle of a DWS. These contexts are of interest for the data quality assessment, since they may affect it. Therefore, this proposal is focused on defining the different contexts that data go through, from the moment they are loaded in the DW until they are used by the end users. The view that data go through different contexts while being transported from the sources to the users is shared with the authors of [51]. On the other hand, it is important to indicate that this work does not focus on data sources and ETL layers quality issues, since many researches have already addressed them. For example, the works in [19][21] [27] focus on data quality in DW sources, and the works in [21] [44][63] focus on quality, especially data cleaning, in ETL processes. In particular, the works in [30][64] focus on the contexts of the ETL processes and the data sources, respectively.

Context in DWS Components
In this section, we present and define the contexts for the components of the DW layer (shown in Fig. 2). Each component, with its respective context, is presented in Fig. 3. In the following we describe the elements that determine each one of these contexts.
• Context in the Data Warehouse (DWC): It is defined by data in the DW, and also documents, e-mails and other data external to the DW that are related to data stored in it. The external data may be tables of personal databases of different users within the organization, which may be linked to the DW data. On the other hand, documents and e-mails that are shared within the organization also may contain information strongly related to data stored in the DW. • Context in the Data Mart (DMC): A Data Mart contains a subset of the data stored in the DW, which have been transformed, and is directed to a specific analysis domain (e.g. a section in the organization). Hence, for us, the DMC is determined by a set of rules that describe properties, constraints and quality requirements specific to the corresponding analysis domain. • Context in Use (CiU): The CiU is the context in the data presentation component, and is determined by data that describe the end-user. These data may be geographical location, language, role, requirements (of data or quality), etc. For example, the DQ requirements could be a minimum level of data accuracy or data completeness.
Information regarding contexts could be stored in the metadata repository shown in Fig. 3. An example of such information are the domain rules, which determine the context in the Data Mart component.

Data Quality according to Contexts
For the quality assessment in the DWS components, taking into account the contexts presented above, this work is based on two quality approaches that are presented below. Fig. 4 shows where these approaches are applied in the DWS.
• Crosby's Meeting Requirements [65]: This approach emphasizes the compliance with the system requirements and it focuses on prevention rather than correction, where the only way to achieve good performance is having zero defects. The essence of this approach is that the requirements must be known and they must be translated into measurable characteristics of the product/service. Therefore, the quality of a product/service is equivalent to the satisfaction of the specification criteria by all the measurable characteristics of such product/service [66]. The Meeting Requirements approach is applied to evaluate data quality in the Data Warehouse and Data Marts components.
• Juran's Fitness for Use [4]: This approach is focused on meeting the needs of the user through the suitability (of the product) for the intended use. The approach called Fitness for Use is applied to assess data quality in the data presentation layer.   The concept of quality applied in the data presentation layer of the DWS coincides with the definition of quality in use of ISO/IEC 25010 [67]. This norm defines quality in use as the quality from the point of view of the user. In addition, the standard defines a set of quality features, some of which are divided into sub-characteristics. An example of the quality feature is usability and the set of sub-characteristics for it, according to the standard, are effectiveness, efficiency and satisfaction. However, these are not the most interesting concepts for this research, but the most relevant is that the quality in use model assesses the quality in a particular context of use that depends on the point of view of the user.
Although the ISO/IEC 25010 [67] focuses on a quality model for Software, it can be adapted and applied to other areas. An example of this is the application of this norm for the definition of a quality in use model in Web portals [68].

DQ Metrics Proposed for a Case Study
In this section, we provide a case study that allows applying the concepts introduced before for defining the DQ metrics. The example used represents a supermarket chain that holds information about sales and promotions in a DW. There are two domains of data analysis, Sales and Advertising, in which different requirements are manifested. These requirements originate the conceptual multidimensional design described in the following paragraph.
In Fig. 6 and 7 the conceptual schema corresponding to the example is presented. We use the CMDM model [69] to represent the dimensions and the dimensional relationships. Firstly, Fig. 6 shows the dimensions involved in the case study: Product, Time, Branch and Promotion with their respective hierarchies. In CMDM each dimension is represented as a box that includes hierarchies, which are composed by levels connected by arrows that represent a N to 1 relationship. In Fig. 7 the dimensional relationships Sales and Discounts are presented. In CMDM dimensional relationships represent the crossing between dimensions and the involved measures (identified by an arrow) that originate the data cubes, i.e. a dimensional relationship represents all the data cubes that may be built, whose axes are formed by one level of each dimension represented in the relationship.   (Fig. 8), the product identifier, its name, the family to which it belongs, the product type and its category are presented. In the dimension table Time (Fig. 9), each tuple contains the date, the month and the corresponding year. In the dimension table Branch (Fig. 10), each tuple contains the identifier of the branch, its name, its city, state and country. Finally, we have the dimension table Promotion (Fig. 11), where each tuple contains the promotion identifier, its name, its city, state and country, the duration period of the promotion (starting and end dates) and the discount promotion percentage.    On the other hand, the fact tables Sales and Discounts are shown in Fig. 12 and Fig. 13, respectively. The former has an identifier, the product identifier, the sale day, branch identifier and the sold amount of the product. The latter has an identifier, the promotion identifier, the product identifier, the sale day, branch identifier and the sold amount of the promotion.

DQ in the DW
In this section we show how the different elements that give context to the DW component determine the quality in this component. In the following, two examples are presented.

Example 1. Context: Document of the organization
In this example, the context is determined by documents that belong to the organization. Table 4 shows, for the quality dimension accuracy and its quality factor semantic accuracy, the quality assessment of each tuple regarding the attribute city, in the dimension table Branch. This attribute represents the city where each supermarket's branch belongs. A document of the organization acts as a referential for the assessment. It contains for each city the corresponding list of branches that belong to it. Part of the document corresponding to the supermarket Super is shown in Fig. 14. Later, it searches the city cy in the referential document, such that this is the city to which bi belongs effectively. Finally, if cx = cy then the tuple is correct. Otherwise, it is an incorrect tuple for the branch bi.

DQ Metric
Result type: {0,1} Figure 14: Structure of the document that contains the branches in each city.
For the quality assessment of the dimension table Branch with regard to the attribute cityName, we verify that each city stored in the dimension table, is the city where the considered branch effectively belongs. For example, in Fig.  10 it can be seen that: • branch 'Super_Rocha_31': this branch belongs to the city 'Rocha', according to the referential document of the organization. The column cityName for this branch says 'Rocha', hence this tuple is correct.
• branch 'Supermarket': according to the referential document of the organization, this branch belongs to the city 'Salto'. However, the column cityName for this branch, in the dimension table Branch says 'Maldonado', hence this tuple is incorrect.
Once all tuples in the dimension table Branch have been evaluated, the values obtained from the DQ metric dwq_Example1 are saved in the metadata repository. These metadata can later be used. For example, one of their utilities could be the correction of the dimension table Branch, to have the correct city to which belongs each branch.

Example 2. Context: DW dimensions
In this example, the context is determined by DW dimensions. In Fig. 15 we present the dimension tables that determine the context in this case. From dimension table Branch we consider the level cityName (this contains the cities where the branches belong) and from dimension table Promotion we consider the level cityName (this contains the cities where the promotions are performed). Table 5 shows, for the quality dimension consistency and the quality factor intra-relation integrity, the quality assessment of the fact table Discounts regarding the city of the branch in which the promotion was performed and the city where the promotion was performed effectively. In this case, the DQ metric verifies that the city of the branch and the city of the promotion are the same.  For example, we can see the following in Fig. 13: • tuple d1: this tuple has the promotion p1 and the branch identifier '28'. The promotion p1 was created for the city 'Florida' as is shown in the dimension table Promotion in Fig 11, and the branch identifier '28' belongs to the 'Florida' city as is shown in the dimension table Branch in Fig 10. The cities match, then the information in this tuple is consistent. Therefore, the tuple d1 is correct. • tuple d2: this tuple has the promotion p2 and the branch identifier '31'. The promotion p2 was created for the city 'Colonia' as is shown in the dimension table Promotion in Fig 11, and the branch identifier '31' belongs to the 'Rocha' city as is shown in the dimension table Branch in Fig 10. The city of the promotion and the branch city in which the promotion was performed not match, then the information in this tuple is not consistent. Therefore, the tuple d2 is incorrect.
As in the previous example, once all tuples in the fact table Discounts have been evaluated, the values obtained from the DQ metric dwq_Example2 are saved in the metadata repository.

DQ in the DM
This section shows how analysis domain rules, which give context to each DM component, determine the quality in this component. We present two examples in which the domains considered are Sales and Advertising. Table 6 shows, for the quality dimension accuracy and its quality factor syntactic accuracy, the quality assessment of the dimension table Branch with regard to the attribute branchName. This attribute represents the name of each branch. In this case, we consider the Sales domain in which there is a domain rule called RSales that says the following: "The name of each branch must contain the name of the supermarket, the name of the city to which it belongs the branch and the number of its identifier, in that order". In this domain, this rule is necessary because the data for the Data Mart must be integrated with other data that verify this rule. • branch with id '29': the name of this branch is 'Florida_29', it is in the 'Florida' city (as shown in the column cityName) and the identifier of the branch is the number '29', but the name of this branch does not contain the name of the supermarket: Super. Therefore, this tuple violates the rule RSales, so it is an incorrect tuple for the domain Sales.

Example 3. Domain: Sales. Context: Domain Rule RSales
• branch with id '31': the name of this branch is 'Super_Rocha_31', the supermarket's name is Super, this branch belongs to the city 'Rocha' (as we can see in the column cityName) and the identifier of the branch is the number '31'. The name of this branch contains the information required by the rule RSales. Therefore, this tuple is correct for the domain Sales.
Once all tuples in the dimension table Branch have been evaluated, the values obtained from the DQ metric dmq_Example3 are saved in the metadata repository. These metadata can later be used. Table 7 shows, for the quality dimension accuracy and its quality factor syntactic accuracy, the quality assessment of the dimension table Branch with regard to the attribute branchName. This attribute represents the name of each branch. In this case, we consider the Advertising domain in which there is a domain rule called RAdv that says the following: "The name of each branch must contain the name of the city to which it belongs". In this domain, it is necessary to organize the promotions by cities. Therefore, this rule is necessary for organizing the marketing information. • branch with id '25': the name of this branch is 'Supermarket'. The city name to which belongs this branch ('Maldonado', as is shown in the field cityName) is not in the name of the branch. Therefore, this tuple violates the domain rule RAdv, it is incorrect for the domain Advertising. • branch with id '29': the name of this branch is 'Florida_29'. The city name to which belongs this branch is 'Florida' (as is shown in the field cityName). The name of this branch contains the information required by the rule RAdv. Therefore, this tuple is correct for the domain Advertising.

Example 4. Domain: Advertising. Context: Domain Rule RAdv
It is important to highlight that tuples of the dimension table Branch that are incorrect for the domain Sales (eg branch with id '29'), are correct for the domain Advertising. This is so because different rules were applied to assess quality in two different domains. Therefore, we can see that different contexts (determined by domain rules), define different quality values. Once all tuples in the dimension table Branch have been evaluated, the values obtained from the DQ metric dmq_Example4 are saved in the metadata repository. These metadata can later be used.

DQ in Use
In this section we show how the different elements that give context to the data presentation component determine the quality in this component. In the following we present two examples that take into account users from different domains: Sales and Advertising. Table 8 presents the DQ metric qiu_Example5, for the quality dimension freshness and its quality factor currency, this last one refers to how updated are the data with regard to its source. The quality assessment is made on the fact table Sales (Fig. 12) with regard to the measure salesAmount. This DW measure represents the amount of sales made for each product in each branch in a day determined. In this case, we consider that the fact table Sales has a column timestamp that indicates the last update of the value of the measure salesAmount. The quality assessment is performed in two different contexts: with regard to a user sales general manager of the supermarket chain and with regard to a user sales manager of a branch. Therefore, for the quality assessment, it is necessary to consider the profile of each user according to his requirements:

Example 5. Domain: Sales. Context: User profile
• user u 1 : since the user sales general manager of the supermarket chain needs to do different statistics, he requires that the last data update has been made, at most, the day before that the data are used. In other words, the data must be 24 hours old at most. • user u 2 : since the user sales manager of a branch needs to do different comparative analysis in real time, he requires that the last data update has been made, at most, one hour before that the data are used. In other words, the data must be 1 hour old at most. It is possible to see that the quality requirement of the user u2 is more demanding than the quality requirement of the user u1. Therefore, data that are valid for the user u2 are valid for the user u1. However, data that are valid for the user u1 are not valid for the user u2. In this case, although is a quality requirement which has influence over the outcome of the DQ metric qiu_Example5, is the user profile who determines which of the two requirements will be taken into account. Thus, the user profile is who determines the outcome obtained from the DQ metric qiu_Example5, so the user is who determines the context in which the quality assessment is performed. Table 9 presents the DQ metric qiu_Example6, for the quality dimension accuracy and its quality factor syntactic accuracy, this last one refers to the syntactic accuracy of the data. The quality assessment is made on the dimension table Branch (Fig. 10). In this case, the context is determined by a quality requirement. Such a requirement is requested by the advertising manager. This user belongs to the Advertising domain. The quality requirement of the user says that the 100% of the branches must have the name written correctly.

Example 6. Domain: Advertising. Context: Quality requirement
For the quality assessment, the DQ metric qiu_Example6 requests the quality values obtained in the execution of the DQ metric qiu_Example4. Last one, applied in the DM component (see Table 7).

Case Study Summary
In this section, we present in the Table 10 a summary with all the examples developed before. In the first column the DWS component in which the quality assessment was performed appears. Following, the number of the example is presented. Later the contextualized and contextualizing objects are shown, and finally the quality dimension and the quality factor assessed appear. For the definition of DQ at the different DWS components, this work is supported by two quality approaches: Meeting Requirements and Fitness for Use. The former is applied for data quality assessment in two components, DW and DM, emphasizing on compliance with the requirements. The latter is applied for data quality assessment in the data presentation layer, emphasizing on meeting the user needs.
In addition, this work presents a literature review that is built over the research question How contexts may be used for assessing DQ in DW?, which is split into other three research questions. The result of the review is commented organized by the posed questions, and besides, some specific results regarding the different approaches of contexts are presented. As shown in this paper, there is no consensus in the literature regarding the management of the context for the data quality assessment in DWS.
To show how contexts are used in quality assessment, we propose a case study in which DWS components are considered and DQ metrics are defined for them. Once the component to be studied is selected, the quality dimension and quality factor to be assessed are chosen. Then, the context and the context-based quality metric are defined. Different contexts may determine different quality values for the same metric.
While DQ assessment in DWS is not a new problem, since there is a lot of literature that investigates and raises the need for managing DQ in this type of systems, no proposals are found with the approach presented by this research. This first proposal defines a framework where DQ problems are placed in DWS and serves as a basis for the integral quality management (assessment, improvement, maintenance, etc.). Therefore, this research is the starting point for new research addressing new challenges.
The immediate challenge, around the proposal presented above, is to define a formal model of the system, of the contexts and quality defined for the DWS components. This will allow detailed and deep specification of the proposed solution. On the other hand, it is interesting to analyse the relationship between DQ and quality in use, addressed by the Meeting Requirements and Fitness for Use approaches, respectively.