Note: I'm running a bottom up design on this project as I won't know what data I'm really working with until I can get it imported and analyze it. Also, I'm not a DBA or developer, so please be gentle...
I am importing 30k+ rows using SSIS (OLEDB -DB2- source to OleDB -2k5- destination). The import works fine, but I just realized that I need to set up a pk on the row emp_ids. The problem is that in the DB2 source, the emp_ids were removed (set to whitespace, but not null). So, I can't just uncheck the 'keep nulls' option and import the data.
Any suggestions or links (using SSIS) on how to identify the rows where emp_id = "whitespaces" and 1) either keep them from being imported, or 2) remove them afterwards?
(I suppose this could be done using sql statement to identify the whitespace rows, but that would present difficulties of its own due to the random spacing nature of the updates. Also, I'm hoping for a checkbox wonder solution.)
Please advise. Thanks!
- Isaac
Why not use a conditional split to look for NULLS and NULLS resulting from a TRIM() operation.So TRIM() your data in the conditional split, and then test that for NULL. If it matches, then you can use that tagged output stream to do with it whatever you wish... You can throw them away, you can push them to their own destination (flat file, SQL server, etc...)|||
That worked perfectly. Thanks for the advice Phil!
- Isaac
|||While experimenting, I also found that the sort transform can accomplish this task. Not only are the rows with whitespaces removed, but this task also removes duplicate ids from the list... two birds with one stone (using a sort task (with delete dups) vs a trim split).
Awesome... once again thanks!
- Isaac
|||
isaacb wrote: While experimenting, I also found that the sort transform can accomplish this task. Not only are the rows with whitespaces removed, but this task also removes duplicate ids from the list... two birds with one stone (using a sort task (with delete dups) vs a trim split).
Awesome... once again thanks!
- Isaac
Hmm... I don't like that... I don't like that the sort transformation removes rows with spaces in them. For that matter, I don't want it to remove NULLs either. Getting rid of duplicates, yes, but I would think your resultset would be reduced to just one row with spaces, as opposed to none. Are you sure it just discarded ALL rows that were "empty"?|||
In looking at it again, it's not perfect as it does leave one duplicate whitespace row (the first one that it finds). While that shouldn't be acceptable in a real world scenario, it works for the first rough pass on my project.
The sort/delete functionality actually works rather well when you select your pk as the row to "sort" on. It only checks against the rows that you specify, so all the verified data is still there. I checked the results against a report that I pulled off the server... I eyeballed it for a few minutes, but it seems to be accurate.
Maybe I'm mis-using the functionality (?), but it works...
|||
isaacb wrote: In looking at it again, it's not perfect as it does leave one duplicate whitespace row (the first one that it finds). While that shouldn't be acceptable in a real world scenario, it works for the first rough pass on my project.
The sort/delete functionality actually works rather well when you select your pk as the row to "sort" on. It only checks against the rows that you specify, so all the verified data is still there. I checked the results against a report that I pulled off the server... I eyeballed it for a few minutes, but it seems to be accurate.
Maybe I'm mis-using the functionality (?), but it works...
No, using the sort transformation to remove duplicates is a very valid use. And you get sorted data which helps is most cases for downstream transformations...
I was just concerned when you said it removed all of the rows with spaces, and it did what it's supposed to do which was to remove duplicates and therefore leave one row behind.
No comments:
Post a Comment