Tag: ETL

  • CSV File Format

    We all know that we should be writing file extracts as XML but if we really need to get a CSV file then here is how to do it Introduction Comma Separated Values (CSV) files have been used since data first had to be exchanged between two applications. CSV files are an imperfect format that…

  • Data Transformation – Procedural & Non Procedural Solutions

    This paper looks at a somewhat awkward data transformation, and at solutions written in SQL and in a procedural language. It describes some techniques which can be used to develop the solution in both languages. It also compares the solutions in terms of ease of development, performance and cost of maintenance. Are we building transformations…

  • Basic Data Quality Checks

    This article looks at basic data quality audit that can be done within a database. Examples are given using Oracle syntax however the techniques can also be applied to other databases   Introduction The following article discusses some of the data quality issues that can be addressed by manual scripts on a copy of the…

  • Detecting Changed Data

    Introduction When loading data warehouses, it is usually possible to decreases the load time very significantly by processing only changes since the last load, rather than completely refreshing all the data every time. This article describes one approach for detecting changes, which has been used successfully in a number of data warehouse projects. Background There…

  • Job scheduling – fixed and relative timing

    This article looks at the relatve merits of two types of processing schedule, once based on a fixed times (similar to ‘cron’ on a unix system) and onebased on relative times (similar to ‘at’ on a unix system) Introduction Much of data warehousing is dependent on running regular jobs to get collect data and load…