Trie

In computer science, a trie, or prefix tree, is an ordered tree data structure that is used to store an associative array where the keys are usually strings. Read more about tries in wikipedia. In this post we will implement a trie in Java. You can visualize a trie as below:

To implement a trie, first of all, we need a representation for the nodes, the circles in the diagram above. Below is the class that we use to encapsulate these nodes:

  private static class Node {
    public char letter;
    public boolean ends;
    
    public Node(char letter, boolean ends) {
      this.letter = letter;
      this.ends = ends;
    }
    
    public Map<Character, Node> children = new HashMap<Character, Node>();
  }

In the diagram, we see that the partial word formed till each node is stored in the node. We will not be following this, we will just store the letter in each node. To indicate that a word ends at this particular node, each node will also be associated with a boolean ends which indicates as to whether a word terminates at this particular letter/node, for example, say we are storing the word “cat”, then the node containing ‘t’ will have the ends variable set to true. Also, each node has a children map, which contains all the letters extending from the particular node. In the diagram above, the node ‘t’ will have two children ‘o’ and ‘e’.

With this knowledge, let us write an empty trie class as below:

public class Trie {
  private Map<Character, Node> root = new HashMap<Character, Node>();
}

Like in the diagram above, we do not start with an empty node for the root, but our root is a map of all root elements. In our trie, if we have all the words in the English dictionary, then the root will have a maximum of 26 entries. Instead of using a map, we can optimize this by using an array of size 26 and using the character values as the indices to the appropriate letters.

Now, let us implement the add method, the method that will be used to add words to the trie.

  public void add(String word) {
    char[] chars = word.toCharArray();
    
    //Get the first letter in the word and see whether it exists as an entry in the root.
    char first = chars[0];

    Node current;
    if (root.containsKey(first)) {
      //Root already contains the letter, hence treat that as the current element.
      current = root.get(first); 
    } else {
      //Letter is not already present as an entry in the root, hence create it and add it to root.
      Node node = new Node(first, false);
      root.put(first, node);
      current = node;
    }

    //Iterate over the other letters.
    for (int i = 1; i < chars.length; ++i) {
      char aChar = chars[i];

      if (current.children.containsKey(aChar)) {
        //This letter is already present in the tree as a child. Hence, just update the current to point to the node
        //represented by this letter.
        current = current.children.get(aChar);
      } else {
        //This letter is not present as a child of the current element, hence add it.
        Node node = new Node(aChar, false);
        current.children.put(aChar, node);
        current = node;
      }
    }

    //Our word ends here, mark the current node as the terminal node.
    current.ends = true;
  }

Now, let us implement a method to check whether a sequence of letters exists in the trie.

  public boolean contains(String sequence) {
    char[] chars = sequence.toCharArray();

    char first = chars[0];

    //If the first letter is not present as a key in the root, there is no chance of this sequence being present in the trie.
    if (!root.containsKey(first)) {
      return false;
    }

    Node current = root.get(first);

    for (int i = 1; i < chars.length; ++i) {
      char aChar = chars[i];

      //This letter is not present in the trie, hence the trie does not have this sequence.
      if (!current.children.containsKey(aChar)) {
        return false;
      }

      current = current.children.get(aChar);
    }

    //We came out of the loop, it means that all the letters are there in the trie, in the appropriate order.
    return true;
  }

Now comes the function which fetches all words from the trie corresponding to the passed in prefix.

  public List<String> get(String prefix) {
    List<String> words = new LinkedList<String>();

    char[] chars = prefix.toCharArray();

    char first = chars[0];

    //First letter is not present as a key in the root, hence there cannot be any words in this trie corresponding to the
    //prefix.
    if (!root.containsKey(first)) {
      return words;
    }

    Node current = root.get(first);

    for (int i = 1; i < chars.length; ++i) {
      char aChar = chars[i];

      //No entry in the tire for this letter, hence there cannot be any words in the trie for the passed in prefix.
      if (!current.children.containsKey(aChar)) {
        return words;
      }

      current = current.children.get(aChar);
    }

    //At this point, we can be sure that there are words in the trie corresponding to the passed in prefix. Now plow down 
    //the depths of the tree and form all the words corresponding to the prefix.
    return formWords(current, prefix, words);
  }

  private List<String> formWords(Node node, String prefix, List<String> words) {
    //A word can be formed at this point, hence add it to the container.
    if (node.ends) {
      words.add(prefix);
      
      //This branch of the tree ends here as there are no more children. Hence return.
      if (node.children.size() == 0) {
        return words;
      }
    }
    
    //Recursively go through all the children of the node and form words.
    for (Map.Entry<Character, Node> entry : node.children.entrySet()) {
      Node _node = entry.getValue();
      formWords(entry.getValue(), prefix + _node.letter, words);
    }

    return words;
  }

That is all there is to it. The code does not do any error checking, only happy path is taken into account. Full source code is present in github.

Iterative code to generate permutations

While trying to solve scribd’s bot puzzle, I wanted to generate permutations of a passed in array.

Say the array has 1,2,3. What are the permutations possible?
321
231
213
312
132
123

If you have to do it with pen and paper, how would you do it?

Say you are currently processing 1. What are the permutations of 1? 1 right?

Now, you are processing 2. What are the permutations of 1 and 2? 12 and 21 right? How did you arrive at this? You added 2 to the beginning of 1 and end of 1.

Now, you are processing 3. What are the previous two permutations that we have? 12 and 21. Now we have to fit 3. What do we do? Add 3 to the beginning and end of both, i.e 312 123 and 321 213. Along with the ends, we also add 3 in between the digits, i.e 132 and 231. There you go, you have all permutations of 1,2 and 3.

Below code extends this idea to any number of elements.

function getAllPermutations(ary) {
    var len = ary.length;
    //console.log('Length:' + len);
    var i = 0, j = 0, k = 0;
    var elem, containerElem;
    //console.dir(ary[0]);
    var container = [[ary[0]]];
    var copied;

    /*
    console.log('------------>------------');
    for (i = 0; i < len; ++i) {
        elem = ary[i];
        console.dir(elem);
    }
    console.log('------------>------------');
    */

    for (i = 1; i < len; ++i) {
        //console.log('I is:' + i);
        elem = ary[i];
        copied = copy(container);
        container = [];
        for (j = 0; j < copied.length; ++j) {
            containerElem = copied[j];
            ret = helper(elem, containerElem);

            for (k = 0; k < ret.length; ++k) {
                container.push(ret[k]);
            }
        }
    }

    return container;

    function helper(elem, ary) {
        /*
        console.log('In helper start');
        console.log('Elem is:');
        console.dir(elem);
        console.log('Ary is:');
        console.dir(ary);
        */
        var len = ary.length;
        var elem;

        var container = [];
        var copied;
        for (var i = 0; i < len; ++i) {
            copied = copy(ary);        
            copied.splice(i, 0, elem);
            container.push(copied);
        }
        copied = copy(ary);        
        copied.push(elem);
        container.push(copied);
        /*
        console.log('Result is:');
        console.dir(container);
        console.log('In helper end');
        */
        return container;
    }
}

function copy(ary) {
    var copied = [];
    var len = ary.length;

    for (var i = 0; i < len; ++i) {
        copied.push(ary[i]);
    }
    
    return copied;
}

Migrating mercurial repos from bitbucket to github

I recently moved my projects from bitbucket to github. I have absolutely no reason to move other than checking out github.

Below are the steps that I used to move my repos:
1. Create an empty git repo in both github as well as bitbucket for the merucrial repo in bitbucket that you want to migrate.

2. Enable hg-git tool and commit to the newly created git repo in bitbucket.

Say for example, your mericurial repo is foo and and git repo is foo-git, the command would look like this:
C:\Personal\Projects\foo>hg push git+ssh://git@bitbucket.org:abhirama/foo-git.git

3. Now create a folder called foo and clone the bitbucket git repo to this folder.
C:\Personal\Projects\git>git clone https://abhirama@bitbucket.org/abhirama/foo-git.git .\foo

4. Merge changes from github repo foo to the repo foo cloned from bitbucket.
C:\Personal\Projects\git\foo [master]> git pull git@github.com:abhirama/foo.git

5. Once, done, change the origin of this repo from bitbucket to github.
C:\Personal\Projects\git\foo [master]> git remote rm origin
C:\Personal\Projects\git\foo [master]> git remote add origin git@github.com:abhirama/foo.git

6. Now push this repo to github.
C:\Personal\Projects\git\foo [master]> git push –set-upstream origin master

Voila, your repo is in github now :).

Controlling soundcloud from your keyboard

I have soundcloud open in one of my tabs while in the other one I am reading an interesting article. The currently paying track ends and a new one starts and this sucks. I want to skip this and move to the next track. How I wish pressing some key while remaining in the same tab would take me to the next track. Some wishes do come true :) and in the past few days I brought this idea to life. I wrote a chrome plugin which exactly does this and some more. Below are the keybindings currently supported:

1. <alt> + k – Jump to the next track.
2. <alt> + j – Jump to the previous track.
3. <alt> + p – Pause the currently playing track or resume the paused track. If no track is playing, play the first track in the page.
4. <alt> + m – If logged in, favorite the currently playing track.

Some improvements which I have in mind:
1. Making the keyboard bindings configurable.
2. Removing jQuery dependency from the code.

Give it a spin and let me know how it works for you.

Script to download Coursera videos

Edit: I do not maintain this script anymore, dgorissen forked this and made it better – https://github.com/dgorissen/coursera-dl

After taking two Coursera classes(db and algo), I wanted a simpler way to download the lecture assets(video files, pdf, ppt etc) from the site.

The features that I was looking for in the downloader:
1. Bulk download from the command prompt.
2. The downloader should recreate the structure present in the video listings page i.e it should create directories corresponding to each weekly topic and their sub topics and download the files into the appropriate directories.
3. The downloader should be smart as in when it is run each time from the command prompt, it should know the assets previously downloaded and download only the newly updated assets. Speaking in programmer terms, it should diff between all the previous downloads and the assets in the current page and download only the newly added ones.

Hence, my free time during the last few days were dedicated to scripting this. The Python script that I wrote for this can be downloaded from github. Extract the contents of the zipped file to the directory where you want the assets to be downloaded and run the python script. More info is present in the readme.txt file accompanying the script. The script recreates the structure present in the video listings page and downloads the assets to the appropriate directories.

The image below shows the directories created and some of the files downloaded for automata course.

Compare the above with the course structure present in the video listings page of automata course:

Whenever new lessons are posted, just run the script and the script takes care of downloading the updated contents. It keeps track of already downloaded files and hence only downloads the newly updated contents.

Give it a spin and let me know how it works for you.

PHP and string zero

Consider the below php code:


$foo = <value from an HTTP param>;

if ($foo) {//make sure this parameter is set

  //do your operations

}

You are in for a rude shock if your HTTP parameter has a value 0. PHP evaluates a string 0 as boolean false.

Edit: I was going through some PHP internals and happened to land on this function which evaluates boolean – http://lxr.php.net/xref/PHP_5_4/Zend/zend_operators.c#501. You can see the check being made for string zero here.

Image masher

A python script that recursively processes png and jpeg images present in the passed in directory and losslessly compresses them using deflopt, optipng, pngcrush, pngoptimizer and jpegtran. This script assumes that you have these binaries in your windows path. Grab the script from github.

Example:

python mash.py C:\foo

PS:If this interests you, take a look at the command line client that I wrote for yahoo!’s smush.it service.

Follow

Get every new post delivered to your Inbox.