When deciding how to build the Data Warehouse managers should keep in minds that the Business Intelligence’s objective is to provide support for decision making. In order for BI application to become a reality, it is important that the B.I is based in large part on the tools available (cleanse, data mart, reporting) in accordance with business objectives.
The Orion Star company holds a big amount of data related to their sales. Those data can give a superior competitive advantage in business decision making. For example, if the company knows when the sales fall, then the company can decide to run a promotion in that period. Or if the company knows the best products, then it can re-invest in them. To identify the Business Intelligence, however, is not without risk. It is important for people who know the company to understand the business requirements before fully investing in it. If the company do not think of, that would cause failure and inefficiency of the warehouse.
Within this context, we will develop an appropriate methodology for performing the relevant Business Questions in a scientifically manner. Hence, the remainder of this report is organised as follows: Chapter 1 provides a solid methodology of the cleansing data and Chapter 2 provides an appropriate methodology for constructing the Data Mart relating to business scenario. Chapter 3 then presents and discuss the results of business intelligence obtained.
Identify Business Intelligence
To start with, along with the building of the Data Mart, cleansing data plays a vital role in the Business Intelligence. Cleaning could be defined as the conversion of data into the appropriate format through transformation for analysis. It also provides with information whether the data can be used. However, we should carefully perform the cleaning checks, in order to be able to build the Data Mart.
It is important to notice that the operational data of even the healthiest corporations will include a level of mistakes. If the data are not correct, the reports will be wrong and, hence, company will make wrong business decisions. Therefore, it is significantly important for our project’s viability to provide accurate data.
The next part of this report, therefore, focuses on the methods of cleaning the data, as well as of determining the Business Intelligence.
Determine your business questions
All in all, and based on the business questions of this business scenario, we may decide on the variables we will use based on the company’s needs.
Therefore, using SAS, we will apply suitable techniques to clean the dirty data. However, as everything in life we should carefully plan this process. The next part of this report provides a methodology in several steps for cleansing the data, based on the Cody’s book (Cody, 1999).
Start by appending the flat files
Step 1: The raw data needs to be imported in SAS. The relevant code can be seen in Figure 1 above. Data can be imported into SAS in a many ways. Since our purpose is to provide accurate data, it was as very appropriate to use the SAS INFILE Statement for the purposes of the assignment. That statement allows user to assign the length and the format of the variables and, hence, the user can manipulate the data more efficiently.
We also assign the permanent library which contains all data sets:
Test the data: Assess the entire data set for missing values, duplications et cetera
Having the data into SAS, it is essential that we develop techniques to identify, locate values that cannot be used for the data mart. Those values could be missing and/or incorrect and/or duplicates values. During this process we will use several techniques for summarizing and visualising the data (EDA).
EDA will help us to clean the data. For categorical variables we use a rather non-graphical technique, namely tabulation of frequencies. That is because some variables have many distinct values and, hence, a graphical analysis would be rather unsatisfactory. The same applies to the numeric values, where we may prefer the PROC MEANS statement rather a boxplot because we are not able to set ranges (e.g. Total_Retail_Price).
To visualise the result we use PROC GCHART. However, it is important to note, that those procedures are not appropriate for big data sets.
In order to test for the accuracy of the data, there are a few programs in SAS that fulfill the conditions of data quality.
Step 2: PROC FREQ: To start with, we have seen that the Data Sets have both numeric and character variables. Firstly, we check for missing values of character variables. For that purpose, we run the PROC FREQ statement. In that way we will be informed whether the variable has missing values and if yes how many values. From the output below we note, that the variable Gender has no missing values, and, hence, it could be used, if it was necessary.