What we have learned last time?
In Day 1 we have learned following things
|
Agenda for Day 2
· How
to design Data Warehouse?
o Now
the question is how to design DW?
o How
Dimensions and Facts are different from traditional tables?
o Data
warehouse design techniques.
o Advantages
and Disadvantages of Star schema and snow flake design.
· Lab
2, 3, 4 and 5- Explore Control Flow.
o Lab
2 - Execute SQL Task.
o Lab
3 – Backup database Task.
o Lab
4 – Script Task and Variables.
How to design Data Warehouse?
We already have a basic knowledge about Data Warehouse by now. It’s typically a database holding data in a denormalized manner. Data from multiple places (multiple data sources) are collected and stored into DW (Data warehouse). SSIS (SQL server integration services) is the Microsoft’s business intelligence platform for performing this task.
Now the question is how to design DW?
In order to design data warehouse you have to ask yourself two questions.
|
DW Basically contains two main types of tables. One is a fact table which contain measurement of a business process and other is a dimension table which will be used for filtering, grouping and sorting of our data. In the above example,
- Location, Time (one which will hold years) and customer will become your Dimensions
- Sale become your Facts.
Note: Facts and Dimensions are simply tables.
Designing Data Warehouse is all about,
- understanding the Dimensions and Facts in the business
- and creating appropriate tables.
How Dimensions and Facts are different from traditional tables
In order to understand this, let’s have an example. Look at the database diagram of tradition Normalized database.
Now let see how to make DW from above one
As you can see,
- Traditional tables are converted into denormalized Dimensions.
- Fact table created contains only 2 things
- Numbers which can be used for calculation
- References for dimension table keys.
Data warehouse design techniques
Above technique where Fact table references Dimension tables and Dimension tables are completely denormalized leads to a database diagram which looks like star and hence called as Star schema design. |
In Star schema design, one fact table will be surrounded by many dimension tables.
There is one more way of designing data ware houses. It’s called Snowflake design. In this normalization of some dimension tables is considered to some extent. |
Advantages and Disadvantages of Star schema and Snowflake design
- Star schema design we get more performance because dimensions are denormalized. In Snowflake performance will be comparatively less compared to star flake.
- In Star schema data management will be difficult compare to snowflake because data redundancy is more.
- In Star schema queries will be simpler because joins will be comparatively less.
Conclusion – If data integrity and data redundancy is big concern snowflake is best. If performance is the major concern star schema fits best.
Lab 2, 3, 4 and 5- Explore Control Flow.
Now it’s time for our labs. In the last lab we have created a simple SSIS package demonstrating the ETL process. In this article we will continue our lab and explore some more Control Flow tasks.
Lab 2 - Execute SQL Task
This task let us custom SQL Script via our package.
Step 1. Create Table
Create a table in your SQL database as follows
Note: EmployeeId is identity column.
Step 2. Create SSIS Project
Create a new Empty SSIS project just like one we created in Lab 1.
Step 3. Create Connection Manager
3.1 Right click connection manager and say new ADO.NET connection.
3.2 Click New
3.3 Enter Server Name, Enter Credential, and Select Database and click OK.
3.4 Click Ok again.
3.5 Rename connection manager to Lab2Connection.
Step 4. Add Execute SQL Task
In SSIS Designer by default control flow is selected. Take “Execute SQL Task” from the toolbox and add it in the designer.
Step 5. Configure “Execute SQL Task”
5.1 Double click the task, it will open up “Execute SQL Task Editor”
5.2 Make sure the left section General is selected. In the right section under SQL Statement group, change connection type to ADO.NET from default OLEDB.
5.3 Next set Connection property to one created in Step 3.
5.4 Next is SQLSourceType. It support three values,
5.4.1 File connection – Let us execute T-SQL Statement stored in a file.
5.4.2 Variable – Let us execute T-SQL statement stored in one of the variable.
5.4.3 Direct Input- let us provide T-SQL statement which we want to execute
Note: On selecting one of the above value, a dynamic option will be displayed in the property window. For “Direct Input” a property called “SQL Statement”, For Variable a property called “Source Variable” and for File Connection property called “File connection” will be dispayed.
For our select “Direct Input”.
5.5 Click “Sql Statement”. A button with three dots in it will appear. Click it. It will popup “Enter SQL Query” dialog box. Enter query into it and click ok.
5.6 Click Ok in the “Execute SQL Task Editor” window.
Step 6. Execute Package
Press F5 and execute the package.
Step 7. Test the execution
Open the sql server database and confirm record is instered.
Note: We are using hard coded query for demonstration. You can check the identity column and confirm whether actually values are inserted or not.
Note: We are using hard coded query for demonstration. You can check the identity column and confirm whether actually values are inserted or not.
Step 8. Prepare for next lab
Grab some food, take some rest and be prepare for next lab.
Step 3. Add Backup Database Task
Take “Backup database task” from toolbox and add it to control flow designer window
Step 4. Configure Backup Database Task
4.1 Double click the task, it will pop up the configure window.
4.2 The first thing to configure in connection in the top. Select the connection manager to one created in step 3.
Note: A very strange thing will happen here. You will not find the connection manager (created in step 3) in the drop down. For that,
4.2.1 Close the window.
4.2.2 Right click your connection manager and go to properties.
4.2.3 Check the value of Qualifier property. It will be something like this, “System.Data.SqlClient.SqlConnection, System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089”.
4.2.4 Change it to Simple “SQL”.
4.2.5 Double click the Backup task again. Now you will find the connection manager in dropdown box.
4.3 Just below connection you will find Database(s) option. Select one or more database from the list.
4.4 Let other setting as it is, only select the path where you want to keep database backups.
4.5 Click Ok
Step 5. Execute the Package
Press F5 and execute the package.
Step 6. Test the execution
Go to the chosen path and see if backup is created or not.
If backup file exist it’s time for next lab.
Lab 4 – Script Task and Variables
In this lab we will learn about two things.
- Script task – It let us perform something which is not possible using any of the existing SSIS tasks. It let us write code in C# and do custom operations.
- Variables – We will learn how to declare variables, how to assign values to them, and how to use them inside Script Task.
Note: Once the variable is created and assign some values we can use it as an input for many tasks. For instance, as an input for Execute SQL Task or may be used inside script task which we are going to do in this lab.
Step 1. Create SSIS Project
Create a new Empty SSIS project.
Step 2. Create Variable and Assign Value
2.1 Do either one of these,
- Right click the control flow and select Variables.
- Click the blue button located in the right-top corner of SSIS designer.
It will open up variable window.
2.2 Click on Add Variable button
2.3 Define the variable and assign the value.
Step 3. Add Script task to control flow
Take Script task from the toolbox and add it control flow
Step 4. Configure script task.
4.1 Double click the script task. It will open Script task editor.
4.2 Next we have to configure what variable we want to pass to the script. We can pass variable as either read only variable or read write variable. Click the triple dot button in front of either ReadOnlyVariable or ReadWriteVariable.
Select the variables and click OK.
Select the variables and click OK.
4.3 Next, click on edit script.
It will make a new instance of Visual studio open up with a “.cs” extension file.
It will make a new instance of Visual studio open up with a “.cs” extension file.
4.4 In the main Method write the following code.
public void Main() { // TODO: Add your code here MessageBox.Show(Dts.Variables["Website"].Value.ToString()); }
4.5 Press Ctrl+Save and save the file.
4.6 Close the Current Visual studio and go back to Sql server data tools where “script task editor” is open.
4.7 Click OK.
Step 5. Execute package.
Press F5 and execute the application.
No comments:
Post a Comment