Δευτέρα 18 Μαΐου 2015

Mining frequent subsequences in Python

So, I was trying to reimplement [Zaki et al 2009, "VOGUE: A Variable Order Hidden Markov Modelwith Duration based on Frequent Sequence Mining"] where they describe VGS, an algorithm for frequent k-subsequence mining while keeping track of the gap symbols. The author has the VGS algorithm implemented in python already, but I decided to implement my own "simulation" of VGS in a map-reduce way.

Here's the code of the function, it accepts a sequence as a python list of objects (tested only with strings though), a minimum support minsup and a maximum gap maxgap between symbols, and it returns a record of the kind {k-subsequence : , (Frequency : , Gap_Symbols : }:


An example (from the VOGUE paper):
>>> seq = list("ACBDAHCBADFGAIEB")
>>> print find_frequent_subsequences(seq,3, 2, 3)

{('A', 'C', 'A'): (2, ['H']), ('C', 'B', 'A'): (2, []), ('C', 'B', 'D'): (2, []), ('A', 'B', 'A'): (2, ['C', 'H', 'C']), ('A', 'C', 'B'): (2, ['H']), ('A', 'B', 'D'): (2, ['C', 'H', 'C']), ('C', 'D', 'A'): (2, ['B', 'B', 'A']), ('D', 'A', 'B'): (2, ['F', 'G']), ('A', 'C', 'D'): (2, ['H']), ('A', 'D', 'A'): (2, ['C', 'B']), ('B', 'D', 'A'): (2, ['A'])}
 

I hope someone finds that useful. Sorry if I re-invented the wheel. I am sure there are faster ways to do it but I needed to build on that.

Explanation:

As described in [Zaki, 2001], map every symbol s in our sequence S, to its position in S. For example the first  2 items will be: [(A, 1), (C, 2), ...].

Then we reduce to F1 = {A: [1, 5, ...], B: [3, ...] } and filter out those elements that appear less tha
n minsup.

Then combine F1 with itself. I.e produce [(AB, [1,3]), (AB, [5,8]), ...] keeping only those elements that have a maximum gap of maxgap. Subsequently, reduce that again to F_k = {AB: [[1,3],[5,8],...],...} and filter out based on the minsup value.

Repeat the same by combining F_k with F_1 until you have reached the desired subsequence length k.

Δευτέρα 29 Σεπτεμβρίου 2014

An Introjucer .desktop file for gnome.

An excellent class library for C++ VST development is Julian Storer's JUCE. It can produce plugins in the VST (2.x, 3.x), RTAS, AAX and  AU formats  and is cross platform across Windows, Linux and Mac OSX.

The best way to program in Juce is through Juce's own Project Management tool Introjucer which relies under `../extras/Introjucer' when cloning JUCE from its GitHub repository.

After installing it by running make in `../extras/Introjucer/builds/Linux' and sudo install Introjucer /usr/local/bin (or your desired installation path) you can copy `../extras/Introjucer/Source/BinaryData/juce_icon.png' to your `~/.icons' directory and then put the following .desktop file into your  `~/.local/share/applications' folder:

#!/usr/bin/env xdg-open
[Desktop Entry]
Type=Application
Encoding=UTF-8
Name=Introjucer
Comment=JUCE's project-management tool and secret weapon.
Exec=PATH_ΤΟ_BIN_DIR/Introjucer
Icon=PATH_TO_YOUR_HOME_DIR/.icons/juce_icon.png
and then you can easily run Introjucer from your gnome-shell.

Πέμπτη 11 Ιουλίου 2013

Fun with finite-state machines, graphviz and python

Hello,

I have come in need for a grammar parser in python (context-free). While there are plenty in python, many of them are very undocumented, some seem to have become orphaned projects and `yapps2` and `pyparsing` that I tried seem to be a bit limited for what I need them (or most probably I'm a noob). So, also in order to refresh my memory on grammars, I decided to make my own parser in python.

The first step was, as I saw it to go back to the roots, to finite-state machines. It happened that it is quite a bit of fun especially because it allows me to have a dip into `graphviz` and `pydot` for pretty plotting the state machines.

Here are some examples from "Elements of the Theory of Computation - H.Lewis, Ch. Papadimitriou, 2nd ed.


The first one is for the language L(M) = { w \in (a,b)* : w has an even number of b's}


And the second one decides the language L(M) = { w \in (a,b)* : w does not have three consecutive b's }


And here is the code: https://gist.github.com/mmxgn/5973875

Σάββατο 29 Ιουνίου 2013

Musical CSPs with Mingus and python-constraint

Hello,

I was looking for a nice python library for constraint programming. Unfortunately, the only one I could find was python-constraint which seemed quite nice and straightforward, but after using it for a while, I find it very limited.

So, as a first example, I tried solving the pretty classic musical CSP described at the Strasheela Examples page as the 'All Interval Series' which is taken from music serialism.

The problem states that we want to put a series of all different pitch classes on the chromatic scale, where each interval appears exactly once. We also take account for inversely equivalent intervals.

I used python-constraint in order to construct my CSP by using pure python syntax, and mingus to create a midi track and render it to .pdf, .png and .mid files.

Here's the code:
https://gist.github.com/mmxgn/5891884

Here's the output score with lilypond:

and the link to the midi file.

Κυριακή 28 Απριλίου 2013

Python generator awesomeness: SEND+MORE=MONEY

Hello,

One awesome thing with python is its "yield" keyword and the notion of generators. It can be applied in order to program in the functional and logic paradigms.

For example, let's take the following problem:

You have the expression:
SEND + MORE = MONEY
where every letter corresponds to a distinct digit (0 to 9, except S and M that cannot be 0). Find the corresponding digits for each of the letters so that the above equation holds.

So we have the variables S,E,N,D,M,O,R,Y.
With the domains:

S,M = [1,..., 9] 
E,N,D,O,R,Y = [0,...,9]
And the constraints:
S,E,N,D,M,O,R,Y distinct. 
SEND + MORE = MONEY


A beautiful way to do this is with python generators. For example, this can be solved in a single python statement as such:

    solutionGen = (\
        (s,e,n,d,m,o,r,y)\
        for s in range(1,10)\
        for m in range(1,10)\
        for e in range(0,10)\
        for n in range(0,10)\
        for d in range(0,10)\
        for o in range(0,10)\
        for r in range(0,10)\
        for y in range(0,10)\
        if len(set([s,e,n,d,m,o,r,y])) != len([s,e,n,d,m,o,r,y]) and\
        s*1000 + e*100 + n*10 + d +\
        m*1000 + o*100 + r+10 + e ==\
        m*10000 + o+1000 + n*100 + e*10 + y)
Just like that. You can get a single solution to this problem to SingleSolution with:
SingleSolution = solution.next()
Or every possible solution in a list with:
Solutions = [i for i in solution]
Magic.

Of course the above is not really efficient. This gave me a runtime of 122s on my core 2 duo e8400.

What you can do to improve it a bit? Well, for starters, replace the domains ( i.e range(0,10) ) with two constants out of the loop, i.e. Dom1 = range(1,10) and Dom2=range(1,10) so that they are not computed at each iteration. Then, a solution is more probable to appear at a range of bigger integers (we want two 4-digit numbers added to give a 5-digit number) so we can reverse the way we search the domains. So let's replace the above with:
    Dom1 = range(9,0,-1)
    Dom2 = range(9,-1,-1)
This, will give me a runtime of 16s.

Can we do more (less)? Yes. If you watch carefully to the above snipplet, we test every possible assignment for a solution while we could reduce the search space by not testing values that could not appear in a solution in the first place. For example, when S gets a value of '3', E cannot appear with the same value in the solution. So, while we search for a solution we sould "propagate" the distinction constraint. Using the yield keyword we can write the above snipplet as:

def solution():
    Dom1 = range(9,0,-1)
    Dom2 = range(9,-1,-1)
    for s in Dom1:
        for m in Dom1:
            if m == s:
                continue
            for e in Dom2:
                if e in [s,m]:
                    continue
                for n in Dom2:
                    if n in [s,m,e]:
                        continue
                    for d in Dom2:
                        if d in [s,m,e,n]:
                            continue
                        for o in Dom2:
                            if o in [s,m,e,n,d]:
                                continue
                            for r in Dom2:
                                if r in [s,m,e,n,d,o]:
                                    continue
                                for y in Dom2:
                                    if y in [s,m,e,n,d,o,r]:
                                        continue
                                    if C2(s,e,n,d,m,o,r,y):
                                        yield (s,e,n,d,m,o,r,y)

 Which gave me a runtime of  0.56s.

Not bad.

Πέμπτη 11 Απριλίου 2013

9 months of Army Service

On 9th of April, I finally completed my 9 month mandatory Service to the Greek Army Armed Forces. It actually seemed like a century of service.

I served:

  • One month of training at the Engineering Corps training camp in Nafplio. First Batallion Third Company.
  • Three months at the Hellenic Presidential Guard Company of Administration.
  • Five months at the Hellenic Army General Staff Batallion as a soldier of the Honor Guard. 
So, now it is over. Time to pick up where I left off: http://9gag.com/gag/7044336.

See you around.

Πέμπτη 31 Μαΐου 2012

PCL&RCL Oz code

Hello,

Just a small update.

Code for my PCL (Propositional Clausal Logic) and RCL (Relational Clausal Logic) theorem prover and model searcher can be found at my github:

git://github.com/mmxgn/generic-logic.git

I was hoping to finish it before giving the link, but I am going to give it anyway.
I hope the usage becomes obvious at the last lines of logic2.oz.


There are some personal life issues that will forbid me to work on it for a long time.