Showing posts with label ssis. Show all posts
Showing posts with label ssis. Show all posts

Wednesday, March 21, 2012

Please Advise: Big Control Flow or Big Data Flow

Hi all! I recently started working with SSIS and one of the things that is puzzling me the most is what's the best way to go:

A small control flow, with large data flow tasks A control flow with more, but smaller, data flow tasksAny help will be greatly appreciated.
Thanks,
Ricardo
Multiple smaller data flows are generally better than a single big one. Breaking a large process into smaller data flows allows you to manage memory better and offers more opportunities for restartability. Large data flows can also suffer from threading problems in the current version.

|||

Dear Friend,

Depends on your system requirements. But there is things that you only can do inside dataflow...

Regards!

|||

Ricardo,

I'd generally agree with what the other guys have said. However, I think it would be rare that you would get the chance to make this decision. The nature of your dataflow is determined by...well...whatever the reason you're building it is. You don't generally get the chance to say "Hmmm, I think i'll put less components in this dataflow". You can of course put more, but you would be foolish to do so.

Similarly, when you have a requirement its pretty clear whether you need a task or a component to achieve it so, again, you don't (usually) have a choice.

-Jamie

|||Thanks everyone for the quick replies. I'm finding it easier as I go along. Some mistakes in the beginning, but after a while, things start being a bit more logical.

Thanks,
Ricardo

Please Advise: Big Control Flow or Big Data Flow

Hi all! I recently started working with SSIS and one of the things that is puzzling me the most is what's the best way to go:

A small control flow, with large data flow tasks A control flow with more, but smaller, data flow tasksAny help will be greatly appreciated.
Thanks,
Ricardo
Multiple smaller data flows are generally better than a single big one. Breaking a large process into smaller data flows allows you to manage memory better and offers more opportunities for restartability. Large data flows can also suffer from threading problems in the current version.

|||

Dear Friend,

Depends on your system requirements. But there is things that you only can do inside dataflow...

Regards!

|||

Ricardo,

I'd generally agree with what the other guys have said. However, I think it would be rare that you would get the chance to make this decision. The nature of your dataflow is determined by...well...whatever the reason you're building it is. You don't generally get the chance to say "Hmmm, I think i'll put less components in this dataflow". You can of course put more, but you would be foolish to do so.

Similarly, when you have a requirement its pretty clear whether you need a task or a component to achieve it so, again, you don't (usually) have a choice.

-Jamie

|||Thanks everyone for the quick replies. I'm finding it easier as I go along. Some mistakes in the beginning, but after a while, things start being a bit more logical.

Thanks,
Ricardo

Tuesday, March 20, 2012

placing ssis packages on server

I have created my packages and i want them to place them on the server.Do i need to place the entire project of dts packages on the server or is there any option to place executables...if so please explain....

And to run these packages on the server do i need to set them as new job at sql server agent or is there any other way i need to run on the server.

I want then to run whenever the text file gets updated is it possible to set anything for my packages to run as and when the text file gets updated..

Please help me with all my questions

Thanks in advance..

If you you are using file based packages (.dtsx); you just need to move the files to a server where Integration services service is running. Then, yes, you can use SQL Server agent to schedule its execution. You could include some conditional logic in your package to check is the text file has been updated and then run the package in a periodically basis.|||

what or how can i check to see if file is updated...is there any task...Please help me with this..

and all my files are .dtsx files So i need to move all these files.Should the server have business intelligence studio installed or can i just take these dtsx packages and execute it through sql server agent..

Thanks

|||

ok, one thing at the time. BIDS is a development tool; it is not required in the server in order to execute the packages. The server needs to have a SSIS instance up and running; the you can use dtexec (package execution utility) to run the package via command line.

Additionally, if you want to schedule a package; you need to install the DB engine and SQL Server agent components on the server.

Regarding, how to include some logic inside of the package to check if the file has been 'updated'; I know there have been similar discussions n this forum about that; here you have a couple:

http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1030485&SiteID=1

http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=502717&SiteID=1

|||

>>The server needs to have a SSIS instance up and running; the you can use dtexec (package execution utility) to run the package via >>command line.

Could you please tell me what ssis instance mean and about package execution utility.Sorry for very basic questions.I am not experienced with this topic.

i want my package to run automatically so i want to schedule.please let me know how to install DB engine and SQL Server agent components on the server.

Thanks

|||

Run the SQL Server setup program on the server. Choose Integration Services to get SSIS installed; choose Database Engine and Agent to get the other pieces.

Books Online has excellent help for setting up the components of SQL Server, and there are a number of walkthroughs available online if you search for "Setup SQL Server 2005".

Wednesday, March 7, 2012

pk constraint enforcement using SSIS to import data

Note: I'm running a bottom up design on this project as I won't know what data I'm really working with until I can get it imported and analyze it. Also, I'm not a DBA or developer, so please be gentle...

I am importing 30k+ rows using SSIS (OLEDB -DB2- source to OleDB -2k5- destination). The import works fine, but I just realized that I need to set up a pk on the row emp_ids. The problem is that in the DB2 source, the emp_ids were removed (set to whitespace, but not null). So, I can't just uncheck the 'keep nulls' option and import the data.

Any suggestions or links (using SSIS) on how to identify the rows where emp_id = "whitespaces" and 1) either keep them from being imported, or 2) remove them afterwards?

(I suppose this could be done using sql statement to identify the whitespace rows, but that would present difficulties of its own due to the random spacing nature of the updates. Also, I'm hoping for a checkbox wonder solution.)

Please advise. Thanks!

- Isaac

Why not use a conditional split to look for NULLS and NULLS resulting from a TRIM() operation.

So TRIM() your data in the conditional split, and then test that for NULL. If it matches, then you can use that tagged output stream to do with it whatever you wish... You can throw them away, you can push them to their own destination (flat file, SQL server, etc...)|||

That worked perfectly. Thanks for the advice Phil!

- Isaac

|||

While experimenting, I also found that the sort transform can accomplish this task. Not only are the rows with whitespaces removed, but this task also removes duplicate ids from the list... two birds with one stone (using a sort task (with delete dups) vs a trim split).

Awesome... once again thanks!

- Isaac

|||

isaacb wrote:

While experimenting, I also found that the sort transform can accomplish this task. Not only are the rows with whitespaces removed, but this task also removes duplicate ids from the list... two birds with one stone (using a sort task (with delete dups) vs a trim split).

Awesome... once again thanks!

- Isaac

Hmm... I don't like that... I don't like that the sort transformation removes rows with spaces in them. For that matter, I don't want it to remove NULLs either. Getting rid of duplicates, yes, but I would think your resultset would be reduced to just one row with spaces, as opposed to none. Are you sure it just discarded ALL rows that were "empty"?|||

In looking at it again, it's not perfect as it does leave one duplicate whitespace row (the first one that it finds). While that shouldn't be acceptable in a real world scenario, it works for the first rough pass on my project.

The sort/delete functionality actually works rather well when you select your pk as the row to "sort" on. It only checks against the rows that you specify, so all the verified data is still there. I checked the results against a report that I pulled off the server... I eyeballed it for a few minutes, but it seems to be accurate.

Maybe I'm mis-using the functionality (?), but it works...

|||

isaacb wrote:

In looking at it again, it's not perfect as it does leave one duplicate whitespace row (the first one that it finds). While that shouldn't be acceptable in a real world scenario, it works for the first rough pass on my project.

The sort/delete functionality actually works rather well when you select your pk as the row to "sort" on. It only checks against the rows that you specify, so all the verified data is still there. I checked the results against a report that I pulled off the server... I eyeballed it for a few minutes, but it seems to be accurate.

Maybe I'm mis-using the functionality (?), but it works...

No, using the sort transformation to remove duplicates is a very valid use. And you get sorted data which helps is most cases for downstream transformations...

I was just concerned when you said it removed all of the rows with spaces, and it did what it's supposed to do which was to remove duplicates and therefore leave one row behind.