R , Python and SAS Analytics: Using large datasets in SAS ! Append Vs Set / Union.

Thursday, March 29, 2012

Using large datasets in SAS ! Append Vs Set / Union.

Guys this is a intelligent tip !
I came across this while dealing with some of the huge dataloads .

Where to use it?! : Whenever adding to a huge dataset .

Code :

A)
PROC DATASETS;
APPEND OUT = membr_a
DATA = membr_b (WHERE=(year=2000));
RUN;

B)
DATA membr_a;
SET membr_a membr_b
(WHERE=(year=2000))

C)
Proc SQl;
create table xxx
select * from membr_a union select * from membr_b ;
quit;

Explanation of code:

A)
OUT --> is the output dataset to which you want to append.
Data --> is the new dataset , which you may want to add to the existing dataset (mentioned in out).

B) & C) --- i won't explain them , too trivial !

Advantage ??

The APPEND function is used to append observations from one dataset to another. Unlike the SET command which reads in all observations from the datasets being concatenated, the APPEND function only reads in the observations from the
dataset being appended. By using the APPEND function over the SET command you will save processing time and work
space

Options: "Force" option is always available when merged dataset are not exactly same !

Thanks,
KB

2 comments:

Anders SköllermoMarch 29, 2012 at 2:19 PM
PROC APPEND works like an OS Utility - it performs block copy of observations, using suitable values of the internal blocksize parameters.

Plus: Very fast, basically only I/O, very little CPU.
Minus: All variables should be in the correct shape and in the wanted order, length, etc..

Option FORCE is available, but can be dangerous if you do not know exactly what you are doing.

Suggestion: First use ordinary SET to investigate the tables, and get both tables in the wanted shape. Then use PROC APPEND.
/ Br Anders Sköllermo
ReplyDelete
Replies
krypton BondMarch 30, 2012 at 11:03 AM
Thanks Anders for more insight !
ReplyDelete
Replies

Add comment