Monday, September 29, 2014

What is Business Analytics?

What is Business Analytics?





Saturday, September 6, 2014

One Hot Encoding for Machine learning

One Hot encoding is used on a particular data set (a matrix) and used as training data for learning algorithms, it gives significantly better results with respect to prediction accuracy, compared to using the original matrix itself as training data.

Many learning algorithms either learn a single weight per feature, or they use distances between samples. The former is the case for linear models such as logistic regression, which are easy to explain.

Suppose you have a dataset having only a single categorical feature "nationality", with values "UK", "French" and "US". Assume, without loss of generality, that these are encoded as 0, 1 and 2. You then have a weight w for this feature in a linear classifier, which will make some kind of decision based on the constraint w×x + b > 0, or equivalently w×x < b.

The problem now is that the weight w cannot encode a three-way choice. The three possible values of w×x are 0, w and 2×w. Either these three all lead to the same decision (they're all < b or ≥b) or "UK" and "French" lead to the same decision, or "French" and "US" give the same decision. There's no possibility for the model to learn that "UK" and "US" should be given the same label, with "French" the odd one out.

By one-hot encoding, you effectively blow up the feature space to three features, which will each get their own weights, so the decision function is now w[UK]x[UK] + w[FR]x[FR] + w[US]x[US] < b, where all the x's are booleans. In this space, such a linear function can express any sum/disjunction of the possibilities (e.g. "UK or US", which might be a predictor for someone speaking English).

Similarly, any learner based on standard distance metrics (such as k-nearest neighbors) between samples will get confused without one-hot encoding. With the naive encoding and Euclidean distance, the distance between French and US is 1. The distance between US and UK is 2. But with the one-hot encoding, the pairwise distances between [1, 0, 0], [0, 1, 0] and [0, 0, 1] are all equal to √2.

This is not true for all learning algorithms; decision trees and derived models such as random forests, if deep enough, can handle categorical variables without one-hot encoding.


Cheers !!
Keep learning ~ stay hungry grow smarter !!

Thursday, September 12, 2013

Creating Validation using SAS format and Intelligent handling plus accumulating format validation error.


I created a small program to cross validate the incoming data with a Master data file.

Approach : I created SAS formats for various validation fields according to business rules applicable.
Then check the incoming data with the formats already created.
Then I accumulated the warning\error in a different file. This way I could accumlate all the error& warnings in one file.


1) Create a format by hardcoding , (I had created format using MS-access files too -- will cover it later on)
proc format library=prod;
value $TODA
'1001INTEGRATION_' = '7081'
'1001PROD' = '7081'
'1001SYSTEM_TEST' = '7081'
OTHER = '0'
;
run;

/*Store SAS format in to required lib*/

proc format
Fmtlib lib=work
cntlout=prod;
select $TODA;
run;


2) Performing validation
/* Validate all postings */

data work.AdjustmentX (drop=nr imperr impfield local_ammount_err accgroup)
work.errorlist_GL (keep=nr imperr impfield)
work.warnlist_GL (keep=nr imperr impfield)
work.cd12026_errorlist_adj (keep=nr imperr impfield)
;
set work.AdjustmentX end=slut;

if cd01001 > ' ' then do;
tst = put(cd01001,$accunit.); /* Performing validation */
if tst = 1 then do;
tst = put(compress(&cd01041!!cd01001),$ccac.);
if tst ^= 1 then do;
%FieldError(cd01001,Account unit not valid with company code &cd01041,errorlist_GL);
end;
end;
else do;
%FieldError(cd01001,Profit center invalid,errorlist_GL); /* Error macro */
end;
end;
else do;
if transaction_type = 'GL' then do;
%FieldWarning(cd01001,Tick-Off Ledger Id missing,Warnlist_GL); /* Warning Marco*/
end;
end;
run;



3) Accumulate Errors and warning via a general macro into a particular dataset.


Here i have called few general macros if there is a error or if i wana give a warning.

/* general macros*/

%macro FieldError(CD,errtxt,mem);
%if "&errtxt"="" %then %let errtxt=Wops;
%if "&CD"="" %then %let CD=WopsAgain;
imperr = "&errtxt";
impfield = &CD;
call symput('imperr',imperr !! impfield);
nr = _n_;
output &mem;
*put '---------------------' imperr= impfield= "&mem";
%mend;

%macro FieldWarning(CD,Warntxt,mem);
%if "&Warntxt"="" %then %let Warntxt=Wops;
%if "&CD"="" %then %let CD=WopsAgain;
impWarn = "&Warntxt";
impfield = &CD;
call symput('impWarn',impWarn !! impfield);
nr = _n_;
output &mem;
*put '---------------------' impWarn= impfield= "&mem";
%mend;


Note : /* Mention name of error log / warning log dataset as : &mem */



Let me know if anyone has any suggestion / doubts on this .

Happy to help , Chow !!



Tag dataset to SAS share server, how to browse that dataset, SQL distinct identifier


This code run both on windows and Mainframe servers.


Attach a dataset to SAS Share server, (Run this after starting the share server)
ALLOCATE LIBRARY Orgdata 'BWZ0.LA.SASDATA.ORGDATA' SERVER=SHARE01;
DATALIBS Orgdata;


Read data set tagged to a SAS share server
LIBNAME dates 'BWZ0.LA.DATE.STORAGE' server=share01 DISP=SHR



SQl query for finding non-unique values:

proc sql;
create table repeat as
SELECT DISTINCT year,week_nr , teamid from La130.Wdata
group by year,week_nr, teamid
HAVING count(*) > 1 ;
quit;


Ways to remove duplicates in SAS SQL :

proc sort data=LA130.WATA out=nodups5
nodupKEY;
by TEAMID DATENUM;
RUN;

Funtion to trigger SAS job from javascript / html (SMART work):

function gotohome()
{
open("/tst-bin/broker?_service=tstnorm&_program=tstsrc.llastart.sas&_debug=0",target="lla_main");
}

NOTE: Broker must be installed on the SAS server
NOTE: Change the service name accordig to your project
NOTE: program name is like: Servicename.programname.sas
NOTE: Target is the HTML page lable/name.
NOTE: Give debug=131 for getting the SAS log


I have few more examples to call SAS broker with various parameters.
/*--------------------------------------------------------------+
| go to Next page |
+--------------------------------------------------------------*/

function next()
{
var _flag;
var year_nr =yearnr[0];
var week_nr =weeknr[0];
_flag = 'N';
alert(year_nr);
alert(week_nr);
open("/tst-bin/broker?_service=tstnorm&_program=tstsrc.lla500.sas&_debug=0&_flag="+_flag+
"&_year="+year_nr+
"&_week_nr="+week_nr+"",
target="lla_main");
}


/*--------------------------------------------------------------+
| go to Previous page |
+--------------------------------------------------------------*/

function previous()
{
var _flag;
var year_nr =yearnr[4];
var week_nr =weeknr[4];
_flag = 'P';

alert(year_nr);
alert(week_nr);


open("/tst-bin/broker?_service=tstnorm&_program=tstsrc.lla500.sas&_debug=0&_flag="+_flag+
"&_year="+year_nr+
"&_week_nr="+week_nr+"",
target="lla_main");
}



This post is mostly for myself , but if you find few pieces useful here do post comments.



Upload and Download to Mainframe from Windows server !! SAS Upload&Down

Its been a while since i posted, had been busy with ma-life :) .

Here is the thing pals, Njoy. Code is self explanatory but if you face trouble do write in comments :

1st logon to you mainframe server:

data _NULL_;
usrid= upcase(sysget('username')); /* Symget will get the user name you are using on windows server */
call symput('usrid',usrid);
run;
options comamid=tcp;
%let mvs0=mvs3ibm.sys.nb.se 5308; /* mention the ip address of mainframe along with port number */
signon mvs0.spawner USER=&usrid noscript pw=_prompt_; /*password will be prompted*/
rsubmit;
options comamid=xms2;
endrsubmit;


A)
/* Use below code for Uploading to mainframe, Note, inlib=work, change to any libname if you have any thing local *******/

rsubmit;
OPTIONS NOERRORABEND;
LIBNAME MVS 'BLL0N.SAS.CENTRAL.DATA' DISP=old; /* mention your mainframe dataset name here */
proc upload inlib=work outlib=mvs;
select Formats /memtype=data;
run;
endrsubmit;
signoff;


B)
/**********For downloading any thing from mainframe */

rsubmit ;
OPTIONS NOERRORABEND;
LIBNAME MVS 'BLL0N.SAS.CENTRAL.DATA' disp=shr; /* mention your mainframe dataset name here */
proc download inlib=mvs outlib=work V6TRANSPORT;
select weekdata /memtype=data;
run;
endrsubmit;
signoff;


Hope it helps, comments are helpful in getting motivation for future post so do not shy away !

Chow!!



Saturday, December 22, 2012

SAS Music composer !! The CODE way :)

Simply using the this macro to create the octaves (More than that , but still named octaves)

Thanks SAS , now i can be AR Rehman some day.




%let pc=1.25;
%macro df3(note,octave,length);
select(¬e.);
when('A') call sound(55*(2**&octave.),&length.*160*&pc.);
when('A#') call sound(58*(2**&octave.),&length.*160*&pc.);
when('Bb') call sound(58*(2**&octave.),&length.*160*&pc.);
when('B') call sound(62*(2**&octave.),&length.*160*&pc.);
when('C') call sound(65*(2**&octave.),&length.*160*&pc.);
when('C#') call sound(69*(2**&octave.),&length.*160*&pc.);
when('Db') call sound(69*(2**&octave.),&length.*160*&pc.);
when('D') call sound(73.5*(2**&octave.),&length.*160*&pc.);
when('D#') call sound(73.5*(2**&octave.),&length.*160*&pc.);
when('Eb') call sound(78*(2**&octave.),&length.*160*&pc.);
when('E') call sound(82*(2**&octave.),&length.*160*&pc.);
when('F') call sound(87*(2**&octave.),&length.*160*&pc.);
when('F#') call sound(92.5*(2**&octave.),&length.*160*&pc.);
when('Gb') call sound(92.5*(2**&octave.),&length.*160*&pc.);
when('G') call sound(98*(2**&octave.),&length.*160*&pc.);
when('G#') call sound(104*(2**&octave.),&length.*160*&pc.);
when('Ab') call sound(104*(2**&octave.),&length.*160*&pc.);
when('R') call sleep((&length./3)*&pc.,1);
otherwise;
end;
%mend;


/* Then ask any music composer to compose the song for you, i would really love DDLJ
the greatest bollywood love song :),
but for now i found this some where on web */

/* Old Mc-Donald Had a farm */

data _null_;
do i=1 to 2;
%df3('C',3,1);
%df3('C',3,1);
%df3('C',3,1);
%df3('G',2,1);
%df3('A',3,1);
%df3('A',3,1);
%df3('G',2,2);
%df3('E',3,1);
%df3('E',3,1);
%df3('D',3,1);
%df3('D',3,1);
%df3('C',3,2);
if i=1 then do;
%df3('R',1,2);
%df3('G',2,2);
end;
end;
%df3('G',2,.5);
%df3('G',2,.5);
%df3('C',3,1);
%df3('C',3,1);
%df3('C',3,1);
%df3('G',2,.5);
%df3('G',2,.5);
%df3('C',3,1);
%df3('C',3,1);
%df3('C',3,2);
%df3('C',3,.5);
%df3('C',3,.5);
%df3('C',3,1);
%df3('C',3,.5);
%df3('C',3,.5);
%df3('C',3,1);
%df3('C',3,.5);
%df3('C',3,.5);
%df3('C',3,.5);
%df3('C',3,.5);
%df3('C',3,1);
%df3('C',3,1);
%df3('C',3,1);
%df3('C',3,1);
%df3('C',3,1);
%df3('G',2,1);
%df3('A',3,1);
%df3('A',3,1);
%df3('G',2,2);
%df3('E',3,1);
%df3('E',3,1);
%df3('D',3,1);
%df3('D',3,1);
%df3('C',3,3);
run;


Just to clarify, u can decrease the value of 'pc' macro variable if the song plays too slowly.


Wednesday, April 18, 2012

SAS Macros Quick and useful TIP

HI Folks,

I have a quick tips for you on macros. Most of the code should be self explanatory, but do post comments if you need yelp/suggestion !

Scenario : you are testing some code in Windows SAS session using macros ! need help to see what all macro variables you had created and you wana clear them all (ofcourse just close the session and restart like an dork !)

use:
%put _all_; /* To see all kinds of macro variables that exist in session */

%put _global_ ; /* To only global macro variables that exist in session */
%put _local_ ; /* for local variables , if any */

but that was not tricky, but indeeed helpful.

use the macro ! to clear all global macro variables in a session (this will also clear all the automatic global variables but nothing important ):


COde
%let mymac1=Krypton;
%put &mymac1;
%macro delvars;
data vars;
set sashelp.vmacro;
run;
data _null_;
set vars;
if scope='GLOBAL' then
call execute('%symdel '||trim(left(name))||';');
run;
%mend;
%delvars;
%put &mymac1;



Thanks,
Kr !!