Friday, January 27, 2012

Canopy clustering algorithm

Canopy clustering algorithm

It is an unsupervised pre-clustering algorithm performed before a K-means clustering or Hierarchical clustering.
It is basically performed to speed up the clustering in the case of large data sets, in which a direct implementation of the main algorithm may be impractical due to the size of the data set.

Algorithm

Start with a set/list of data points and two distance thresholds T1 > T2 for processing.

  1. Select any point (at random) from this list to form a canopy center.
  2. Approximate its distance to all other points in the list.
  3. Put all the points which fall within the distance threshold of T1 into a canopy.
  4. Remove from the (main/original) list all the points which fall within the threshold of T2. These points are excluded from being the center of and forming new canopies.
  5. Repeat from step 1 to 4 until the original list is empty.
For an exhaustive study please go through a paper by McCallum, Nigam and Ungar, located at http://www.kamalnigam.com/papers/canopy-kdd00.pdf

References:

  1. Andrew McCallum, Kamal Nigam and Lyle H. Ungar, Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching
  2. https://cwiki.apache.org/MAHOUT/canopy-clustering.html
  3. http://en.wikipedia.org/wiki/Canopy_clustering_algorithm

Wednesday, January 25, 2012

jQuery document.ready() function

jQuery document.ready() function: syntax and features

JavaScript provides us with load() function in order to perform specific action whenever the DOM gets loaded (i.e. a page gets rendered). One issue with this function is that it is not triggered until all the elements within the page (including the images) get displayed. With $(document).ready() function, all the event driven codes in JavaScript are guaranteed to be run once the DOM is ready.

The syntax for .ready() is illustrated by the following example:
            $(document).ready(function(){
                        alert('This code is triggered once the DOM is ready');                               
// perform other necessary actions if any
            });

A short way to write the same function is as follows:
            $(function(){
                        alert('This code is triggered once the DOM is ready');                               
// perform other necessary actions if any
            });

A simple reason that we can shorten our code is that any function which is passed as an argument to the jQuery constructor is bound to the document ready event.

Another very important feature of $(document).ready() is that it can be used or referenced more than once within a document. So the following piece of code would be possible (within a single file):
            $(document).ready(function(){
                        // some piece of code
            });
           
            $(document).ready(function(){
                        // other piece of code
            });

One scenario where more .ready() could be used more than once is when we have a common JavaScript which is referenced across a given project. And, there is another JavaScript which is referenced in a particular file. We would then be using the following references in that particular file:
<script src="/js/common-file.js" type="text/javascript"></script>
<script src="/js/specific-file.js" type="text/javascript"></script>

References:
  1. http://think2loud.com/653-jquery-document-ready-howto/
  2. http://api.jquery.com/ready/
  3. http://docs.jquery.com/Tutorials:Multiple_$%28document%29.ready%28%29