Friday, January 27, 2012

Canopy clustering algorithm

Canopy clustering algorithm

It is an unsupervised pre-clustering algorithm performed before a K-means clustering or Hierarchical clustering.
It is basically performed to speed up the clustering in the case of large data sets, in which a direct implementation of the main algorithm may be impractical due to the size of the data set.

Algorithm

Start with a set/list of data points and two distance thresholds T1 > T2 for processing.

  1. Select any point (at random) from this list to form a canopy center.
  2. Approximate its distance to all other points in the list.
  3. Put all the points which fall within the distance threshold of T1 into a canopy.
  4. Remove from the (main/original) list all the points which fall within the threshold of T2. These points are excluded from being the center of and forming new canopies.
  5. Repeat from step 1 to 4 until the original list is empty.
For an exhaustive study please go through a paper by McCallum, Nigam and Ungar, located at http://www.kamalnigam.com/papers/canopy-kdd00.pdf

References:

  1. Andrew McCallum, Kamal Nigam and Lyle H. Ungar, Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching
  2. https://cwiki.apache.org/MAHOUT/canopy-clustering.html
  3. http://en.wikipedia.org/wiki/Canopy_clustering_algorithm

Wednesday, January 25, 2012

jQuery document.ready() function

jQuery document.ready() function: syntax and features

JavaScript provides us with load() function in order to perform specific action whenever the DOM gets loaded (i.e. a page gets rendered). One issue with this function is that it is not triggered until all the elements within the page (including the images) get displayed. With $(document).ready() function, all the event driven codes in JavaScript are guaranteed to be run once the DOM is ready.

The syntax for .ready() is illustrated by the following example:
            $(document).ready(function(){
                        alert('This code is triggered once the DOM is ready');                               
// perform other necessary actions if any
            });

A short way to write the same function is as follows:
            $(function(){
                        alert('This code is triggered once the DOM is ready');                               
// perform other necessary actions if any
            });

A simple reason that we can shorten our code is that any function which is passed as an argument to the jQuery constructor is bound to the document ready event.

Another very important feature of $(document).ready() is that it can be used or referenced more than once within a document. So the following piece of code would be possible (within a single file):
            $(document).ready(function(){
                        // some piece of code
            });
           
            $(document).ready(function(){
                        // other piece of code
            });

One scenario where more .ready() could be used more than once is when we have a common JavaScript which is referenced across a given project. And, there is another JavaScript which is referenced in a particular file. We would then be using the following references in that particular file:
<script src="/js/common-file.js" type="text/javascript"></script>
<script src="/js/specific-file.js" type="text/javascript"></script>

References:
  1. http://think2loud.com/653-jquery-document-ready-howto/
  2. http://api.jquery.com/ready/
  3. http://docs.jquery.com/Tutorials:Multiple_$%28document%29.ready%28%29




Sunday, September 18, 2011

Crystal Reports: the very basics

Introduction:
Crystal Reports is a report generating program, popularly called report writer, which allows users to create a variety of reports from different sort of data sources. It was originally developed by Crystal Services Inc. as a report writer for their accounting software. It is currently under the ownership of German software corporation SAP.
It was bundled into Microsoft Visual Studio from versions 2003 to 2008. Since the 2010 version of visual studio, it is available as a separate download from SAP website as free software.
Supported Data sources:
Crystal Reports supports a variety of databases such as MySQL, Oracle, Microsoft Sybase and PostgreSQL. Spreadsheet programs such as Microsoft Excel are also accessible via crystal reports. Data can also be fed through text files and XML files.

Further details and demo project will be added soon.
References:
         i.            http://en.wikipedia.org/wiki/Crystal_Reports
       ii.            Various online websites

XSD (XML Schema Definition)


XSD (XML Schema Definition)

XSD basically defines the way an XML file should be defined and structured. It serves as a design tool; a framework on which XML implementations can be built.
Since XSD is an xml document, it is easy to learn and implement. One of the biggest advantages of XSD lies in the fact that it supports inheritance which means re-usability.

XSD Base Data Types

A base data type can be used to create user defined types. For example a “string” base type can be used to define a “fullName” type data in the following manner:
<xs:element name="fullName" type="xs:string" />

XSD provides a variety of such data types. Some of the more common are: string, int, dates, Boolean, decimal, double and so on.

XSD Facets

Facets provide restrictions to the way base data types are used. One example of using such restrictions is as follows:
<xs:element name="password">
  <xs:simpleType>
    <xs:restriction base="xs:string">
      <xs:length value="8"/>
    </xs:restriction>
  </xs:simpleType>
</xs:element>

XSD Schema Element Types

XSD elements can be of two types: simple and complex.
a)      Simple: it allows user defined type to be created from given “base data type”. An example is:
<xs:element name="fullName" type="xs:string" />

b)      Complex: when an element is supposed to contain child elements, then it has to be defined as a complex type. An example is:
<xs:element name="complexElement">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="testChild1" type="xs:string" />
        <xs:element name="testChild2" type="xs:int" />
        <xs:element name="testChild3" type="userDefinedType" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>

Annotations (Comments)

XSD provides commenting feature with “annotation” element.  Unlike XML comments (<!—comment text -->), the element is a part of the schema component. It consists of two child elements: documentation and appinfo.
a)      documentation: provides information about the purpose of the source code
b)      appinfo:  provides information about the application to the user

Case Study

Consider a scenario that XYZ institute is a human resource training company. It requires information from the trainees to give them a certificate after successfully completing any sort of training.

Following is an example of XML schema for the above requirement:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema id="XMLSchema1"
    targetNamespace="http://tempuri.org/XMLSchema1.xsd"
    elementFormDefault="qualified"
    xmlns="http://tempuri.org/XMLSchema1.xsd"
    xmlns:mstns="http://tempuri.org/XMLSchema1.xsd"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
> 
  <xs:element name="trainee">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="regNo" type="xs:short"/>
        <xs:element name="fullName" type="xs:string" />
        <xs:element name="fatherName" type="xs:string" />
        <xs:element name="permanentAddress" type="xs:string" />
        <xs:element name="citizenNo" type="xs:string" />
        <xs:element name="passportNo" type="xs:string" />
        <xs:element name="creditHrs" type="xs:short" />
        <xs:element name="system" type="xs:string" />
        <xs:element name="fromDate" type="xs:date" />
        <xs:element name="toDate" type="xs:date" />
        <xs:element name="courses" type="courseList" />
        <xs:element name="receiveMail" type="udf1"  />
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:annotation>
    <xs:documentation>This element provides information for end user</xs:documentation>
    <xs:appinfo>This element provides information to the developer about the code</xs:appinfo>
  </xs:annotation>

  <xs:annotation>
    <xs: documentation>
      The "courseList" simpleType is a list of strings that defines the courses element.
      "courseList" is an example of global scoped simple type element.

      The "udf1" simpleType is used to store the confirmation from trainees if they want
      to participate in further training programs.

    </xs: documentation>
  </xs:annotation>

  <xs:simpleType name="courseList">
    <xs:list itemType="xs:string" />
  </xs:simpleType>

  <xs:simpleType name="udf1">
    <xs:restriction base="xs:string">
      <xs:enumeration value="Y" />
      <xs:enumeration value="N" />
    </xs:restriction>
  </xs:simpleType>

</xs:schema>

And, following is one of the XML files containing the actual data:

<?xml version="1.0" encoding="utf-8"?>
<trainee xmlns="http://tempuri.org/XMLSchema1.xsd">
  <!-- This xml file contains the information on a trainee. -->
  <regNo>5915</regNo>
  <fullName>Madam Bahadur</fullName>
  <fatherName>Mohan Bahadur</fatherName>
  <permanentAddress>Kausaltar,Bhaktapur</permanentAddress>
  <citizenNo>48569/Bhaktapur</citizenNo>
  <passportNo>12597A</passportNo>
  <creditHrs>20</creditHrs>
  <system>Foreign Labor System</system>
  <fromDate>2010-10-01</fromDate>
  <toDate>2010-12-02</toDate>
  <courses>korean cooking refrigeration </courses>
  <receiveMail>Y</receiveMail>
</trainee>

I just shared the small things I learned about XSD. Hopefully this will be useful. For detailed information on XSD,  please go to w3c website at http://www.w3c.org/ or msdn website.

References:

·         http://www.15seconds.com/issue/031209.htm
·         http://msdn.microsoft.com/en-us/library/ms256235.aspx
·         http://www.w3schools.com/schema/schema_intro.asp


Thursday, September 8, 2011

This is my first weblog

Hello internet enthusiasts,

I am Ashish Karki. I am a software developer from Kathmanud, Nepal.I will be posting my research interests and other articles in this blog over the time.

Sincerely,
Ashish Karki