Data Quality
Data Profiles of SQL Server Information Services stored in tables
Submitted by carlos on 14 September, 2009 - 09:36The task Data Profile of SQL Server Information Services stores the results of profiling in an XML document that can be examined with the Data Profile Viewer. Article Dataprofiling with SQL Server 2008 explains how to use this new Task in SSIS.
Although this method is very simple, sometimes may not be sufficient. Addressing a data quality project may involve, for example, storing a history of profiles to assess how data quality of processed data has been improving.
The best way to work with historical data is using a database and storing the data in tables, where you can make queries, reports and comparisons. To achieve that all you would need to do is moving the metadata that the profiling task has been storing in the XML file to database tables.
Well, someone has already prepared an easy way to do it. Thomas Frisendal from the website Information Quality Solutions explains how to create an XSLT file for each type of profiling that is used to extract the XML generated by the Data Profile Task SSIS into one or more XML files with a format that can be directly imported to tables .
- Read more
- 371 reads
-
Data profiling with SQL Server 2008
Submitted by carlos on 14 September, 2009 - 09:03One of the many improvements brought about SQL Server 2008 at the ETL with Integration Services is their ability to perform data profiling with its new Data Profile Task.
The data profiling is one of the first tasks typically addressed in Data Quality processes, and involves an initial analysis of the source data, usually on tables, with the goal of beginning to know their structure, format and level of quality. Inquiries are made at the table level, column, relationships between columns, and even relationships between tables.
The SSIS Data Profile Task works by selecting a table in a SQLServer 2000 database or above (no use on other databases) the profiling options you want to perform on the data in the table, and an XML file for saving the result. It's really simple.
You can select up to 8 types of profiling, 5 for column level and 3 several columns level analysis.
Column level profile:
- Read more
- 288 reads
-
Dataclean.es: a project of Data Cleansing services
Submitted by carlos on 12 December, 2008 - 00:00It does already enough time I presented me the possibility to start a project to offer cleaning services of data online. If we speak in terms of what plows he is heard more, we would be able to interpret it as a new meaning of the acronyms DAAS: Datacleansing Ace TO Service.
At that time I chose the name of Dataclean.es, among others things because the control was free. I registered it to my name and I did an approximation to a plan of business. Until I began to prepare a web where wanted to create a first simple version of the idea. This prototype remained in practically a simple structure, but I think that can serve to illustrate the intention that had.
As in the end me did not I decide to give the great step and to develop the project, and is a grief that the effort that dedicated to do the approach remain in a document of my portable one, I have determined to share the plan of business, enclosed in this post. I swallowed I have placed online the prototype web that began. Notice that is just as I left it, functions almost nothing.

- Add new comment
- Read more
- 312 reads
-
Informatica World 2008 in Las Vegas
Submitted by carlos on 27 June, 2008 - 01:37Finally I could attend the Informatica World 2008, now I'll try to explain what I found there. The conference was held under the slogan 'Gain the Edge' from june 3 to 5.
First day began (after breakfast) with a general session entitled Vision. Strategy. Technology Announcements. Industry leadership. At the session both Sohaib Abbasi, President and CEO of Informática, as Chris Boorman, Ivan Chon and Girish Pancha, Vice-presidents in the Marketing, Data quality and Data integration areas respectively, showed their vision of current market, how technology and business are evolving and what role play in this context data and data-related aplicacions.
Much of what was familiar to me, because I had seen before at the Powerday 2008 Barcelona. This is an indicator that the company maintains a well-defined strategy, and shares with its partners.
Each speaker under the perspective of his area made particular emphasis on data value and the importance for each organization of the ability to maintain consistency and quality, ensure on time data availability, protection, sinchronization, and eficient data management and exchange with other organizations.
We noted that now having a Data Warehouse System that delivers new information every certain hour interval is insuficient. Internet and technological development, the globalization and our competitors are the reason. We heard repeatedly terms like SaaS, Real Time and Data Quality, clues about the new features of Powercenter and other tools of the company.
They also made an interesting demo about how an application like Salesforce.com can be synchronized in real time through the internet with a Google Docs spreadsheet. This cloud to to cloud computing example was made showing Salesforce.com on left screen and Google spreadsheet on right screen, each controlled by a laptop. The made a change in Salesforce.com data, and we saw how Google spreadsheet was automatically updated. Then they made a change in the spreadsheet and we saw how Salesforce.com data was also updated. Finally, they repeat the last part, but using an iPod Touch instead of the laptop. We must consider the opportunities that new mobile devices bring when we connect it to Internet.
- Add new comment
- Read more
- 754 reads
-
Data Quality and Integration at PowerDay 2008
Submitted by carlos on 26 May, 2008 - 08:55This April the Powerday's seventh edition was held. This is an annual event organised by PowerData and it's objective is to give attendees a comprehensive view of the appropiate strategy to make the most of his data. I had the opportunity to attend at the one celebrated in Barcelona.
They showed half-hour presentations, which talked about the importance of Data Quality and Integration processes, and about the current market and technological situation. Obviously they talked also about how Informática tools like PowerCenter can help to make things easier.
These are the presentation titles:
- Add new comment
- Read more
- 934 reads
-
