Among the Data Management activities that organizations performs the processes that monitor and ensure data quality are becoming critical. The volume of information is constantly growing in organizations. Having a reliable data storage is essential for a correct analysis and explotation of these data, avoiding inconsistencies, misleadings and facilitating the development of future systems based on master data consistent, cleansed, enriched and reliable.
In the market for a Data Integration is a leading manufacturer Informatica. This company is the first independent provider of data integration software. His best-known tool and the heart of his platform is Informatica PowerCenter, which has gone through many versions, and is a reference in the world of integration.
But apart from PowerCenter, Informatica also has other tools that focus on more specific purposes, while that are integrated into the platform, and always in the context of Data Integration.
The task Data Profile of SQL Server Information Services stores the results of profiling in an XML document that can be examined with the Data Profile Viewer. Article Dataprofiling with SQL Server 2008 explains how to use this new Task in SSIS.
Although this method is very simple, sometimes may not be sufficient. Addressing a data quality project may involve, for example, storing a history of profiles to assess how data quality of processed data has been improving.
The best way to work with historical data is using a database and storing the data in tables, where you can make queries, reports and comparisons. To achieve that all you would need to do is moving the metadata that the profiling task has been storing in the XML file to database tables.
Well, someone has already prepared an easy way to do it. Thomas Frisendal from the website Information Quality Solutions explains how to create an XSLT file for each type of profiling that is used to extract the XML generated by the Data Profile Task SSIS into one or more XML files with a format that can be directly imported to tables .
One of the many improvements brought about SQL Server 2008 at the ETL with Integration Services is their ability to perform data profiling with its new Data Profile Task.
The data profiling is one of the first tasks typically addressed in Data Quality processes, and involves an initial analysis of the source data, usually on tables, with the goal of beginning to know their structure, format and level of quality. Inquiries are made at the table level, column, relationships between columns, and even relationships between tables.
The SSIS Data Profile Task works by selecting a table in a SQLServer 2000 database or above (no use on other databases) the profiling options you want to perform on the data in the table, and an XML file for saving the result. It's really simple.
You can select up to 8 types of profiling, 5 for column level and 3 several columns level analysis.
Column level profile:
Power MatchMaker is a Data Cleansing tool that has freed SQLPower becoming licensed in Open Source, along with the Power Architect (Data Modeling Tool). As there is not that too many Open Source tools in the field of data cleansing, I have been curious and I've installed to see this work. The installation was very simple, the software is downloaded from Download Power MatchMaker in different versions depending on the OS. I have tried the windows, which is installed in a coup button 2 minutes. Important not to forget the order of the Java Runtime 5. Once installed, to see how it is best to follow the tutorial that it is in aid of the tool. I also recommend seeing the demo available from the same
In the link Managing Dates Quality can be agreed to an article of Ron Hardman on how carry out data cleaning processes with Oracle Warehouse Builder.
The article begins with an introduction to the quality of the data and ways to negotiate it, being one of them the utilization of the data cleaning options of Oracle Warehouse Builder.
The interesting thing is that is shown how to discharge a script with data of test, and how to configure the tool to test the utilities of Profiling, definition of Rules (Dates Rules), and correction or cleaning of the data. In this manner can be seen and to test in a simple way how to implement a basic process of Dates Cleansing with this tool.
The original article is in English but seeking in the web of Oracle I have found the 3 documents that enclosed, translated the Spaniard, and related to OWB and the cleaning of data:
- Executive report - Oracle Warehouse Builder 11g Version 1 General Information
- Oracle Warehouse Builder Dates Quality Option
- Oracle Warehouse Builder Enterprise ETL Option