Πέμπτη 4 Φεβρουαρίου 2010

Calculating conditional pmf matrices in python (numpy)

Here is the thing that has tormented me for some time now.

I have a conjunctive probability table, with shape, for example (1,2,3,4,5,6) .
And I want to calculate the probability table, conditional to a value for some of the dimensions, for decision-making purposes. We define the values we want

The code I came up with at the moment is the following (the input is the dictionary "vdict" of the form {'variable_1': value_1, 'variable_2': value_2 ... } )


for i in vdict:
dim = self.invardict.index(i) # The index of the dimension that our Variable resides in
val = self.valdict[i][vdict[i]] # The value we want it to be
d = d.swapaxes(0, dim)
d = array([d[val]])
d = d.swapaxes(0, dim)
...


So, what I currently do is:

1. I translate the variables to the corresponding dimension in the cpt.
2. I swap the zero-th axis with the axis I found before.
3. I replace whole 0-axis with just the desired value.

I put the dimension back to its original axis.

Now, the problem is, in order to do step 2, I have (a.) to calculate a submatrix
and (b.) to put it in a list and translate it again to array so I'll have my new array.

Thing is, stuff in bold means that I create new objects, instead of using just the references to the old ones and this, if d is very large (which happens to me) and methods that use d are called many times (which, again, happens to me) the whole result is very slow.

So, has anyone come up with an idea that will subtitude this little piece of code and will run a lot faster? Maybe something that will allow me to calculate the conditionals in place.

Edit: I replaced the command in bold, with the code below:

d = conditionalize(d, dim, val)
where:

def conditionalize(arr, dim, val):
arr = arr.swapaxes(dim, 0)
shape = arr.shape[1:] # shape of the sub-array when we omit the desired
count = array(shape).prod() # count of elements omitted the desired dimension.
arr = arr.reshape(array(arr.shape).prod()) # flatten the array in-place.
arr = arr[val*count:(val+1)*count] # take the needed elements
arr = arr.reshape((1,)+shape) # the desired sub-array shape.
arr = arr. swapaxes(0, dim) # fix dimensions
return arr
Now, what before took 15 minutes to complete, now takes only about 6 seconds!