Tuesday, February 28, 2012

Fifty Nifty Pytools

In this post, we look at several simple but useful python code snippets. While we say "fifty" in the post title, it is hoped that over time the count will grow much larger.

[Snippet 0001] Particularly while working in Finance, one needs very strong date libraries and tools. Python gives us datetime which is excellent, and additional functionality can be built atop that. When one has to tie together data from excel spreadsheets, databases e.g. MySQL, and other sources and maintain a single data set that can be used for data-mining or back-testing, having good date functions at hand becomes critical. An example of such is our first snippet below. More date-related snippets will likely follow.

import os, sys;
from datetime import *;


import os, sys;
from datetime import *;

def dtConv(x,s="OBJ"): # converts dates from one form to another
 def s2(n): # nested helper function. returns str form of n if >10 else "0"+str(n)
  if n<10: return "0"+str(n);
  return str(n);

 # first, parse the input depending on type, collecting year, month, day as int
 # styles (s) supported: 
 # s="OBJ" return type is a date object. the default return type
 # s="TXT" return type is of the form "yyyymmdd" e.g. "20120131"
 # s="XL"  return type is of the form "m/d/yyyy" e.g. "2/3/2012"
 # s="XL0" return type is of the form "mm/dd/yyyy" e.g. "02/03/2012"
 # s="DB"  return type is of the form "yyyy-m-d" e.g. "2012-2-3"
 # s="DB0" return type is of the form "yyyy-mm-dd" e.g. "2012-02-03"
 if type(x)==date: y,m,d=x.year,x.month,x.day;
 else: 
  if x.count("/")==2: y,m,d=int(x.split("/")[2]),int(x.split("/")[0]),int(x.split("/")[1]);
  if x.count("-")==2: y,m,d=int(x.split("/")[0]),int(x.split("/")[1]),int(x.split("/")[2]);
  if x.count("/")==0 and x.count("-")==0 and len(x)==8: y,m,d=int(x[:4]),int(x[4:6]),int(x[6:]);
  
 # next, we generate output in the form requested
 if s=="OBJ": return date(y,m,d);
 if s=="XL": return "/".join([str(m),str(d),str(y)]);
 if s=="DB": return "-".join([str(y),str(m),str(d)]);
 if s=="XL0": return "/".join([s2(m),s2(d),s2(y)]);
 if s=="DB0": return "-".join([s2(y),s2(m),s2(d)]);
 if s=="TXT": return s2(y)+s2(m)+s2(d);
 return -1;



Examples of use:
dtConv("1/2/2012") gives datetime.date(2012,1,2)
dtConv("1/2/2012","DB0") gives "2012-01-02"
dtConv("1/2/2012","TXT") gives "20120102"
dtConv("20120102") gives datetime.date(2012,1,2)

[Snippet 0002] Simple object example with output. We define a circular list for ordinary list objects, accessing beyond the end of list causes errors. clists overcome this problem through modular arithmetic on index location.

class clist(object):
# creates a circular list object
 num_inst=0; # enforces singleton pattern

 def __init__(self,arg1=[]): # constructor
  if clist.num_inst==0: 
   clist.num_inst+=1;
   self.L=arg1;
  else: 
   print "cannot have more than one instance of class clist";
   self.__del__();

 def __del__(self): # object destructor
  pass;

 def __len__(self): # get length of clist
  return len(self.L);

 def __getitem__(self,key): # get an item of clist
  pos=key%len(self.L);
  return self.L[pos];

 def __contains__(self,key):
  if key in self.L: return True;
  return False;

 def __reversed__(self): # reverse clist contents
  self.L.reverse();
  
 def content(self): # accessor for clist contents
  return self.L;

 def redef(self,L): # reset clist contents
  self.L=L;

# sample use:
>>> execfile("clist.py");
>>> b=clist([1,2,3,4,'a','b','c']);
>>> len(b)
7
>>> reversed(b)
>>> b.content();
['c', 'b', 'a', 4, 3, 2, 1]
>>> b.redef([1,2,3,4,5,6,7]);
>>> b.content();
[1, 2, 3, 4, 5, 6, 7]
>>> len(b);
7
>>> b[1]
2
>>> b.content()
[1, 2, 3, 4, 5, 6, 7]
>>> 'a' in b
False
>>> 1 in b
True
>>> c=clist([1,2,3]);

cannot have more than one instance of class clist
>>> b[-13]
2
>>> b[13]
7
>>>

[Snippet 0003] Here, we combine the previous two snippets to design a date class dt that behaves like the examples from snippet 0001 i.e. we can store the date any way we like and seamlessly convert from one form to another as desired. Notice how clean the Object Oriented approach looks in comparison to the purely procedural technique, though of course, for more complex examples, there is probably a greater up-front price to pay in designing things in the OO way as opposed to coding procedurally. The former is arguably easier to maintain however. The code (dt.py) and output follow:


import os, sys;
from datetime import *;

def s2(x): # returns "03" if x==3, or "10" if x==10
 if x<10: return "0"+str(x); # i.e. adds leading zeroes as needed
 else: return str(x);

class dt(object):
 # creates a dt object and provides means to view it in different ways

 def __init__(self,x): # constructor
  if type(x)==str and x.count('/')==2: # covers XL and XL0 types
   m,d,y=x.split("/");
   self.m,self.d,self.y=int(m),int(d),int(y);
  if type(x)==str and x.count('-')==2: # covers DB and DB0 forms
   m,d,y=x.split("-");
   self.m,self.d,self.y=int(m),int(d),int(y);
  if type(x)==date: self.y,self.m,self.d=x.year,x.month,x.day;
   # covers the date object format
  
 def __del__(self): # destructor
  pass;

 def OBJ(self): # returns the date object
  return date(self.y,self.m,self.d);

 def TXT(self): # returns the text representation
  m,d=s2(self.m),s2(self.d);  
  return str(self.y)+m+d;

 def XL(self): # returns the Excel date type
  return "/".join([str(self.m),str(self.d),str(self.y)]);

 def XL0(self): # returns Excel date type with leading 0s
  return "/".join([s2(self.m),s2(self.d),str(self.y)]);

 def DB(self): # returns the MySQL DB date type
  return "-".join([str(self.y),str(self.m),str(self.d)]);

 def DB0(self): # returns the MySQL DB date type with LZs
  return "-".join([str(self.y),s2(self.m),s2(self.d)]);

# sample output generated as below
>>> execfile("dt.py");
>>> a=dt("4/10/2012");
>>> a.OBJ();
datetime.date(2012, 4, 10)
>>> a.TXT();
'20120410'
>>> a.DB();
'2012-4-10'
>>> a.DB0();
'2012-04-10'
>>> a.XL0();
'04/10/2012'
>>> a=dt(date(2012,4,10));
>>> a.OBJ()
datetime.date(2012, 4, 10)
>>> a.TXT()
'20120410'
>>> a.XL0();
'04/10/2012'
>>> a.XL();
'4/10/2012'
>>> a.DB();
'2012-4-10'
>>> a.DB0();
'2012-04-10'
>>>

[Snippet 0004] Python provides native support for a number of different and useful data-types. However, sometimes we may need to solve problems for which the required data types are not readily available. For instance, sometimes we may want to store data in a tree or a trie. How would we go about doing this? Luckily, Python is very intuitive in its support for these kinds of derived data structures. We present a simple implementation below that can be used to build graphs, DAGs, tries etc along with some sample examples of its use.


class node(object): # the class that creates every node object
 allNodes=[]; # static class variable to store all nodes created

 def __init__(self,val=-1): # constructor that creates node with specified tag
  self.val=val;
  self.parent=[];
  self.child=[];
  node.allNodes+=[self];

 def getVal(self): # accessor function that returns the tag value of a node
  return self.val;

 def addChild(self,n): # mutator function that connects a node to a child
  if self.getVal()!=n.getVal():
   self.child+=[n];

 def addParent(self,n): # mutator function that connects a node to a parent
  self.parent+=[n];

 def getChildren(self): # returns a list of child nodes for a node
  return self.child;

 def getChildVals(self): # returns a list of child node values for a node
  t=self.getChildren();
  r=[i.getVal() for i in t];
  return r;

 def getChildByVal(self,val): # returns a particular child node of a node by value
  p=self.getChildren();
  q=self.getChildVals();
  if val not in q: return None;
  else: return p[q.index(val)];

# Example usage

a=node(2);
b=node(3);
c=node(4);
d=node(5);
e=node(6);
f=node(7);
g=node(8);

a.addChild(b);
a.addChild(c);
b.addChild(g);
c.addChild(d);
c.addChild(e);
e.addChild(f);

b.addParent(a);
c.addParent(a);
d.addParent(c);
e.addParent(c);
f.addParent(e);
g.addParent(b);


def getDFSChain(n): # get the depth first search chain of nodes and values
 if type(n)!=node: return -1;
 r=[n];
 c=n;
 for i in r: r+=i.getChildren();
 for i in r: print i.getVal(),;

getDFSChain(a);

# output follows:
2 3 4 8 5 6 7

[Snippet 0005] This is a very interesting code snippet I remember reading on the web somewhere. Comes from a Google software engineer with a screen name of "Tryptych". I don't remember which website I read it, but the idea was so clean and elegant I remember it clearly. It is an algorithm to factorize numbers. Code follows along with example output:

import os, sys;

def F(x): # this function factors a number x into its prime factors
 r=[];
 i=2;
 while x>1:
  while (x % i)==0: 
   r+=[i];
   x/=i;
  i+=1;
 return r;

print "prime factors of 100 are: ",F(100);
print "prime factors of 1024 are: ",F(1024);
print "prime factors of 1789 are: ",F(1789);
print "prime factors of 2013 are: ",F(2013);
print "prime factors of 11204243 are: ",F(11204243);
print "prime factors of 112042431 are: ",F(112042431);
print "prime factors of 1120424311 are: ",F(1120424311);

# output follows:
prime factors of 100 are:  [2, 2, 5, 5]
prime factors of 1024 are:  [2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
prime factors of 1789 are:  [1789]
prime factors of 2013 are:  [3, 11, 61]
prime factors of 11204243 are:  [19, 23, 25639]

prime factors of 112042431 are:  [3, 3, 101, 123259]
prime factors of 1120424311 are:  [1120424311]

[Snippet 0006] 

No comments:

Post a Comment