IN2BI, Microsoft Business Intelligence
data, stories & insights
rss | email | twitter
$category.Body

DWH Deck 2.0

In  my presentation at the DWHAutomation conference in Amsterdam on 20.09.2012 I gave a demo of the new version of the DWH Deck: DWH Deck 2.0. Here are the slides of that presentation:

This new release is now available for interested parties.
(If you are interested send me an e-mail: marco@in2bi.nl)

The DWH Deck is a simple tool that allows you to build and maintain your data warehouse better and faster. The basic idea is best described by looking at the input and the output:

INPUT:

  • Meta data that describes the the source object
  • Templates that describe target table and load pattern
    (I have provided 9, but you can change these and add additional templates)

OUTPUT:

  • SQL Statements to create target tables and additional objects,
  • SQL Statements to create stored procedures to load the data
  • BIML Files to create SSIS Packages that will handle the data load
     

In these blog posts you’ll find additional information:

  • do
    15
    nov 12

    DWH Deck 2.0

    In  my presentation at the DWHAutomation conference in Amsterdam on 20.09.2012 I gave a demo of the new version of the DWH Deck: DWH Deck 2.0. Here are the slides of that presentation:

    This new release is now available for interested parties.
    (If you are interested send me an e-mail: marco@in2bi.nl)

    The DWH Deck is a simple tool that allows you to build and maintain your data warehouse better and faster. The basic idea is best described by looking at the input and the output:

    INPUT:

    • Meta data that describes the the source object
    • Templates that describe target table and load pattern
      (I have provided 9, but you can change these and add additional templates)

    OUTPUT:

    • SQL Statements to create target tables and additional objects,
    • SQL Statements to create stored procedures to load the data
    • BIML Files to create SSIS Packages that will handle the data load
       

    In these blog posts you’ll find additional information:

  • do
    15
    nov 12

    DWH Deck 2.0: How to change or add templates

    In previous blog posts I described the basics of working with the DWH Deck. In this post I’ll dive a bit deeper into the templates.

    Importance of templates

    The template concept is the basis of the DWH Deck:
    The DWH Deck combines the information from the source system with the selected templates to create the target data warehouse and the necessary ETL.

    image

    In the template tab of the DWH Deck there are 3 areas, indicated above:

    1. The area that describes the naming conventions you want to use for your target schema, target table and job.
    2. The area where you can define additional (housekeeping) columns that you want to add to the target table:
      • AuditID
      • Start date
      • End date
      • Record source
      • Surrogate key
    3. The template statement area. In this part you can edit the selected template statement. You can choose between:

     

    Placeholder tokens

    At several places you can use placeholders that are replaced with the actual value when a job is created.

    In the next image of the dimension template the placeholder @ObjectName is marked. When you create a job for the Customer dimension this placeholder is replaced with the name of the source object. (Customer) Which will result in:

    • Load_dimCustomer for the job name
    • Customer for the target table name (in schema dim)
    • Additional surrogate key column with the name: dimCustomerKey

    image 

     

    In the template statement 2 types of placeholders are used:

    • Type A starts with one @ character (e.g.: @SourceConnection)
    • Type B representing a column list that starts with two @@ characters (e.g.: @@SourceColumns)

    The current Type A placeholders are:

    • @CreatedOn, the date the job was created
    • @TemplateID, the id of the template used
    • @TemplateName, the name of the template used
    • @MetaConnection, the name of the connection with the solution (meta) tables
    • @MetaServer, the name and instance of the server for the meta connection
    • @MetaDatabase, the name of the database for the meta connection
    • @SourceConnection, the name of the source connection
    • @SourceServer, the name of the database for the source connection
    • @SourceDatabase, the name of the source database
    • @SourceObject, the source object e.g: [dbo].[vw_Customers]
    • @ObjectName, the name of the source object e.g.: Customers
    • @JobID, the id of the job
    • @JobName, the name of the job
    • @TargetConnection, the name of the target connection
    • @TargetServer, the name and instance of the server for the target connection
    • @TargetDatabase, the name of the target database
    • @TargetSchema, the name of the target schema
    • @TargetTable, the name of the target table

     

    The current Type B placeholders are:

    • @@SourceColumns
      ... [col1], [col2]..
    • @@NotSelectedSourceColumns
    • @@TargetColumns
    • @@DefintionTargetColumns
      … [col1] INT IDENTiTY(1,1), [col2] varchar(50) ..
    • @@PKColumns
    • @@BKColumns
    • @@srcColumns
      … src.[col1], src.[col2]
    • @@tgtColumns
      … tgt.[col1], tgt.[col2]
    • @@WhereList
      … src.[col1]=tgt.[col1] AND src.[col2]=tgt.[col2]
    • @@DimWhereList
      … his.[col1]=cur.[col1] AND his.[col2]=cur.[col2]
    • @@LKPColumns
      … <Column SourceColumn=col1 TargetColumn=col1 />
    • @@SCDColumns
      … <Column SourceColumn=col1 TargetColumn=col1  MappingType=Key />
    • @@DimViewColumns
      … cur.[col1], his.[col1] as col1_his

    The Type B placeholders are defined in a separate stored procedure: meta.usp_ColumnLists.
    You can view and add definitions that are used for these placeholders.

    In this blog post I described the templates.
    This concludes my first series of post on this version of the DWH Deck.

  • wo
    14
    nov 12

    DWH Deck 2.0: Step 1 Create solution database and other connections

    The DWH Deck is a simple tool that allows you to build and maintain your data warehouse better and faster.

    In this blog post I’ll describe the steps necessary to create the solution database and other connections that will be used as source or target connection.

    In this sample we will use the Northwind database as the source connection.
    And an empty database NorthwindDW1 which we will use as the solution- and target database.

    image

    In the DWH Deck first create the solution:

    • Enter DW1 in the Name Textbox
    • Enter Server[\Instance] in the Server Textbox
      (Do not use “localhost” or “ .” for your server name)
    • Choose NorthwindDW1 in the Database Listbox.
    • Click the Create solution button

    image

    The DW1 connection is added to the tree view, beneath Connections.
    In the Message text box feedback is provided about the actions performed by the DWHDeck.

     

    Next check in SQL Server Management Studio that the tables and stored procedures are added to the database.

    • In Object Explorer open the NorthwindDW1 database.
    • Open Tables, Programmability and Stored Procedures and check the tables and stored procedures that are added to the solution database.

    image

     

    Next we will create the source connection to the Northwind database:

    • Click Connection in the menu bar.
    • Enter Northwind in the Name Textbox
    • Enter Server[\Instance] in the Server Textbox
    • Choose Northwind in the Database Listbox.
    • Click Save in the menu bar.

    image

     

    After we have created the connections we can start creating the jobs.
    This will be the subject of my next blog post.

    Overview of the DWH Deck 2.0

  • wo
    14
    nov 12

    DWH Deck 2.0: Step 2 Add Jobs by selecting source objects and applying a template

    The DWH Deck is a simple tool that allows you to build and maintain your data warehouse better and faster.

    In a previous blog post we created the solution database and added connections which we’ll use as source or target connection in this post.
    In this post we will create jobs that will describe a source to target mapping.

    Add the first job with the DWH Deck

    • Click Job in the menu bar
    • Select Northwind in the Source connection listbox
    • Select [dbo].[Categories] in the Source table listbox
    • Select 1 Staging in in the Template listbox
    • Select DW1 in the Target connection listbox
    • Click Save in the menu bar

    image

    Repeat above steps for the following tables:

    • Customers
    • Employees
    • OrderDetails
    • Orders
    • Products

    image

    In this step we created the jobs that describe the source to target mapping.
    After we have created these jobs we can publish them.
    This will be the subject of my next blog post.

    Overview of the DWH Deck 2.0

  • wo
    14
    nov 12

    DWH Deck 2.0: Step 3 Publish the jobs to create the SQL Statements and BIML Files

    The DWH Deck is a simple tool that allows you to build and maintain your data warehouse better and faster.

    In previous blog posts we created connections and jobs that describe the source to target mapping. In this blog post we’ll examine the effects of publishing the jobs.

    • Click Publish on the menu bar
    • Check the created files in (default location:) C:\in2bi\DWHDeck
    • Check the created tables and stored procedures in the NorthwindDW1 database.

    image

    image

    In this step we examined the effects of publishing the jobs: The SQL statements that are executed and the BIML files that are build.
    In the next post I’ll describe how you create SSIS Packages from these BIML files.

    Overview of the DWH Deck 2.0

  • di
    13
    nov 12

    DWH Deck 2.0: Step 4 Create SSIS packages from the created BIML Files

    The DWH Deck is a simple tool that allows you to build and maintain your data warehouse better and faster.

    In previous blog posts we created connections and jobs that describe the source to target mapping. In this step we will build SSIS packages from the published BIML files.

    To create SSIS Packages from BIML files you need to install Bidshelper. This is a free and popular add-in for SQL Server Data Tools (the successor of the Business Intelligence Development Studio)

    To create the SSIS Packages for the BIML files we created in step 3 we’ll need to:

    • In Visual Studio create a new Integration Services Project with the name: SSIS NorthwindDW1
    • Make sure the ProtectionLevel of the project is set to DontSaveSensative. Right-Click the project and select properties to change this property.
    • Right-Click SSIS NorthwindDW1 in the Solution Explorer. Select Add and Existing Item…
    • Select all the files in C:\in2bi\DWHDeck\Create BIML Package using Shift and next click Add
    • Select the BIML files under Miscellaneous in the Solution Explorer. Right-Click and select Generate SSIS Packages.

    Test the created packages:

    • To import the data using the SSIS Packages: Right-Click the MasterPackage.dtsx and select Execute Package
    • To import the data using the stored procedures: Right-Click the MasterProcedures.dtsx and select Execute Package

    image

    In step 1 – 4 we looked at the basics of the DWH Deck. We created staging tables, stored procedures and SSIS Packages to import the data from the source system.

    In the next blog post we will take an in-depth look at the templates

    Overview of the DWH Deck 2.0

  • di
    11
    okt 11

    Data Vault Automation Conference

    in2bi was one of the sponsors of the Data Vault Automation Conference which was held in Utrecht on October 6th.

    Data Vault has become pretty popular in the Netherlands the last years. Due to the rigid modular approach of Data Vault modeling source data can easily be divided in the three standard table types with standard load mechanisms which lead to easy automation.

    Both aspects contribute to a lot of innovation in the Netherlands with respect to data warehouse automation.

    During the conference more than 100 participants where kept up to date by:

    You can download the slides of my session by clicking on the powerpoint logo.PowerPoint

  • ma
    23
    mei 11

    New features in the next release of the DWH Deck

    The next release of the DWH Deck ...

    will support templates

    In the current version of the DWH Deck all SQL statements are build based upon the metadata. Future releases will support templates that you can change to fit your data warehouse / ETL framework. Working on these templates we will provide documentation as we work along..

    will support the creation of SSIS packages

    In the current version of the DWH Deck all ETL is handled with stored procedures. The next release will also support the creation of SSIS packages to handle the ETL. Providing better Error-handling possibilities.

    and more ...

     

    Current plan is to release next version in September 2011. (updated on July 4th 2011)

  • ma
    18
    apr 11

    DWH DECK Part 7: Data Mart

    Adapted for version 1.1 on 18/04/2011

    This is the seventh part of a series I'm writing on the DWH DECK. A tool that will allow you to build and maintain your data warehouse better and faster. This series will act as documentation and will give you some insides into the way the program works.

    Overview of this series

    The data mart area is the area that your user will use to query the data warehouse. The architecture is based on Ralph Kimball's dimensional modeling concept (star schema):
     
    star schema - dimensional modelling 
    There are two types of tables in the data mart area:
     

    Fact tables
    A fact table that contains the measurements associated with a specific business process. In the DWH Deck this translates to: Every link table in the data vault with measures in the associated satellites will be the basis for a fact table. A fact table will contain:

    • References to dimension tables
    • All related measures (in DWH Deck: check the Measure checkbox)
    • All related Time indicators (in DWH Deck: check the Time indicator checkbox)

     dwh deck - data mart 
     

    Dimension Tables
    Dimensions describe the objects of the business, such as employee, customer etc. In the DWH Deck this translates to: Every hub in the data vault that has a relation with afore mentioned fact tables and its related satellites.
     
    Slowly changing dimensions
    Managing changes over the time has always been a major issue in data warehousing. Kimball uses the concept of slowly changing dimensions. For every attribute/column you can enter the change type:
    Type 1. Overwrite Old Data. (Default) Use this method if you don't care about keeping track of historical values.
    Type 2. Track History. This is a powerful technique for capturing attribute values over time. In the ETL process a new row in the dimension table is added and the previous row describing the object is end dated.

  • ma
    18
    apr 11

    DWH DECK Part 6: Data Vault

    Adapted for version 1.1 on 18/04/2011

    This is the sixth part of a series I'm writing on the DWH DECK. A tool that will allow you to build and maintain your data warehouse better and faster. This series will act as documentation and will give you some insides into the way the program works.
    Overview of this series
     
    Many Tables in your data warehouse
    Using the "simple" demo database Northwind with only 13 source tables DWH DECK will build 57-110 tables depending on your choices. A breakdown by table type:
    13 Staging tables
    10 Hub tables
    7 Link tables
    17-70 Satellite tables
    7 Dimension tables
    3 Fact tables

    At first hand this seems to be overkill. Especially the many relative small tables in the data vault will be met skeptical by some. The reason we do this is that a data warehouse should be built with the future in mind… Your company will make major changes in their erp and crm systems… They will acquire new companies… They will start new initiatives… And management will change (their focus) …And all this will lead to changes in your data warehouse.
     
    Utilizing a technique like the data vault with many 'small' tables will give you the flexibility to adapt better and faster to these changes. This is for me the most import reason for using the data vault method. But of course there are many more. Dan Lindstedt the inventor of the data vault describes these advantages in:
    http://danlinstedt.com/datavault/about-the-data-vault/
     
    The Data Vault in essence consists of three types of tables:
     
    Hub Entities (Hubs)
    A hub is a unique list of business keys. The business key is the key that business users use in their daily operations to describe an object. (e.g. account number for general ledger accounts, the VIN (Vehicle Identification Number) of cars)
    It's not always easy to determine the business key and sometimes you will have to use more than one key.
    (e.g. ZIP Code and House number or Last Name and First Name)
     
    By default the DWH Deck assigns the primary key of a table as a business key because these are unique. But that's often not the best choice! Speak to the business users and check the profile information of each column. Columns with distinct value percentage of 100% are good candidates for the business key of the table.
    You can easily change the business key by checking and unchecking the business key checkbox:
     
     
    Link Entities (Links)
    Links represent the relationships between hubs and other links. In the DWH Deck links are created based on the relationships in the source system. If these relationships are all well-defined and build in the source system you won't have to do anything. If not there are several ways to create them. (e.g. click the button "New Primary relation" or "New Foreign relation" and selected the relevant table and column to create the relationship.
     
    Satellite Entities (Satellites)
    Satellites contain descriptive information of the hubs and links. They provide context to the hubs and links. By default all candidate satellite columns are placed in their own satellite table. The DWH Deck uses the Satellite textbox for the satellite's table name. This results in a table for almost every column.
    Often it is better to group related satellites like address into the same table. This can be accomplished by using the same name in the Satellite Group textbox. (e.g. use address for street, housenumber, zipcode, city etc)