Efficient Algorithm to manipulate a matrix

download Efficient Algorithm to manipulate a matrix

of 6

Transcript of Efficient Algorithm to manipulate a matrix

  • 7/29/2019 Efficient Algorithm to manipulate a matrix

    1/6

    CIS 501 (Fall 2011)

    Lab 2

    Wei Lee Woon, CIS Program,

    Masdar Institute, Abu Dhabi, UAE

    1 Introduction

    In todays laboratory, the aim is to achieve the following:

    1. (Quickly) tie up a few loose ends from the last laboratory, particularlyplotting and using functions/scripts.

    2. Implement a kNN classifier, and use it to classify data from a well known,publicly available data set.

    2 Plotting

    Plotting numerical results in Octave is extremely easy. For instance try to runthe following code:

    x = linspace(0, 2*pi, 100);

    y = sin(x);

    plot(y);

    The previous commands display the sin functions on the screen in a separatewindow. The command plot takes, in the simplest form, only one argument:the values of the y-axis. If you try:

    plot(x, y);

    the plot is the same but this time the value of y-axis is plotted against thex-axis. Further, we can make more complicated graphical plot in octave. Forinstance lets try to plot a second curve:

    z = cos(x);

    hold on

    plot(x, z,*r);

    hold off

    1

  • 7/29/2019 Efficient Algorithm to manipulate a matrix

    2/6

    The previous commands display a second curve on the top of the previous

    one. First, the hold on operator tells Octave not to overdraw the previous plot.The hold off operator releases Octave to hold the figure. Try again:

    plot(x, y);

    In general without using hold on Octave is always in the hold off mode.Second, there is an extra third argument in the plot command, i.e. *r.plotaccepts extra arguments after the x and y in order to set the graphical charac-teristics of the plots. I suggest to look to the octave help( help ones or moreconveniently in the online documentation) to became familiar with differentcharacteristics.

    The following command:

    axis tight

    is often used to help tidy up a plot and prepare it for inclusion in a report, forexample.

    Once youre happy with your plot, use the following command to generatea graphic of the plot:

    print -djpg .jpg

    Confirm that this generates a jpeg file with the specified name. Use the helpcommand to determine other possible file formats that can be generated usingthe print command. You should also take some time to familiarize yourselffurther with the octave plotting facilities, using the trusty help command.

    3 Functions and Scripts

    There are many places where we want to write a function that manipulates yourworkflow of operations. So far we have worked with the interactive session ofOctave. However, Octave is also a programming language which includes batchsessions, i.e. you can store your set of command into a file and run them alltogether later. These files are known as M-files (in common with their largely-compatible Matlab siblings). They are useful for automating computations youhave to perform repeatedly from the command line. There are two main optionsin Octave:

    Scripts A script is the simplest kind of M-file that contains a sequence of

    statements. Lets open a new text file. In Linux you can use for instanceEmacs. Write the previous plot commands:

    a=100;

    x = linspace(0, 2*pi, a);

    y = sin(x);

    plot(x, y);

    2

  • 7/29/2019 Efficient Algorithm to manipulate a matrix

    3/6

    When you finish, save the file as plotScript.m in the directory lab1 you

    have created. An M-file has always a .m extension. Now go back to theOctave command line and type

    plotScript

    As you can see a script runs the operations described in the file and canproduce graphical output using commands like plot. Moreover, scripts canoperate on existing data in the workspace, or they can create new data onwhich to operate.

    To check this, try clear-ing all variables from the workspace, then runningplotScript. If you check the workspace again, you will notice that thevariables a,x,y are now present.

    Functions Functions are special scripts which can accept input arguments andreturn output arguments. In this case all the internal variables are localto the function. Using the previous example we can write the followingfunction. In a new file plotFunction.m try:

    function y1=plotFunction(a1)

    x1 = linspace(0, 2*pi, a1);

    y1 = sin(x1);

    plot(x1, y1);

    After which, run the following code:

    out=functionScript(100);

    You can see that octave plots the figure. However, if you type whos none ofthe internal variables of plotFunction.m are available in the workspace.

    4 The k-NN classifier

    4.1 Introduction

    This is the main activity associated with todays laboratory. You are required toimplement two versions of the k-NN classifier, which was covered in the previous

    lecture. To test your classifier, you will be using a slightly modified version ofthe classic Iris data set1.Two files have been provided to you: iris tra, containing the training

    instances, and iris tes, containing the test instances. Each set contains 75instances and can be loaded directly into Octave using the load command.

    The following is a portion of the isis tra file:

    1http://en.wikipedia.org/wiki/Iris flower data set

    3

  • 7/29/2019 Efficient Algorithm to manipulate a matrix

    4/6

    0.224 0.624 0.067 0.043 1.0 0.0 0.0

    0.749 0.502 0.627 0.541 0.0 1.0 0.00.557 0.541 0.847 1.000 0.0 0.0 1.0

    .

    .

    0.224 0.208 0.337 0.416 0.0 1.0 0.0

    0.529 0.584 0.745 0.918 0.0 0.0 1.0

    As can be seen, it is a plain text file, containing 75 7 numbers. The firstfour columns are the feature vectors (1 per row), while the last three columnsare the class labels. Each column corresponds to one of the three Species of Irisflowers. The file iris tes is structured similarly but as mentioned, this subsetof the data is to be used for testing purposes only.

    4.2 The task

    Assume that you only know the labels of the instances in the training set. Buildtwo k-NN classifiers: an unweighted classifier, and one which uses a distanceweighting scheme. Please implement the following two weighting schemes:

    1. Inverse distance:

    w =1

    d

    2. Gaussian based:w = exp

    d2

    Your code should at least include two functions myknn and myweighted-

    knn. Function myknn should have the following function signature:

    function output=myknn(training_features,training_labels,test_features,k)

    training_features -> training features in ntraining x dim format

    training_labels -> labels for training instances, ntraining x nclasses format

    test_features -> test features in ntest x dim format

    k -> Number of neighbours to consider

    output -> ntest x 1 vector of predicted classes

    (ntraining and ntest are the number of training and test instances respectively.dim is the dimensionality of the feature space and nclasses is the number ofclasses)

    Function myweightedknn will be very similar, but with an additional pa-rameter weightfunction, as follows:

    function output=myweightedknn(training_features,training_labels, ...

    ... test_features,k,weightfunction)

    training_features -> training features in ntraining x dim format

    4

  • 7/29/2019 Efficient Algorithm to manipulate a matrix

    5/6

    training_labels -> labels for training instances, ntraining x nclasses format

    test_features -> test features in ntest x dim formatk -> Number of neighbours to consider

    output -> ntest x 1 vector of predicted classes

    weightfunction -> 1 (inverse) 2 (gaussian-based)

    This help text is from myknn, but myweightedknn should have the same sig-nature (but should implemented the distance weighted k-NN classifier instead).

    Once you have completed your classifier, you can use the script genlabre-sults.m (available on Moodle) to test your classifiers over a range of values ofk. You might also want to note that genlabresults.m uses several functions,namely legend, title and axis, which have not been covered before, but whichwill very likely come in useful in the future.

    You should be able to produce the plot shown in figure 1 (please check).

    5

    10

    15

    20

    25

    30

    5 10 15 20 25 30 35 40 45 50

    UnweightedInverse weigted

    Gaussian weighted

    Figure 1: k ranging from 2 to 20, line plots

    5

  • 7/29/2019 Efficient Algorithm to manipulate a matrix

    6/6

    4.3 Submission guidelines

    Deadline for submission is 12pm, 10th of October (next Monday). Latesubmissions will be rejected.

    As usual, only electronic submissions will be accepted. The following is thesubmission procedure (which is almost the same as the previous laboratory):

    1. Submit only the M-files in your implementation of the myknn and my-weightedknn functions. These should be sent as attachments in yoursubmission e-mails (along with any additional support functions whichare required to run your code).

    2. Send your solutions via e-mail to the TA, Khulood Al Junaibi (e-mail:[email protected] ), and CC a copy to me.

    3. Format your subject line as follows: [CIS501] Fall 2011, Assignment #2solution. Name:

    4. If you do not get an acknowledgement e-mail from the TA, please re-sendthe assignment.

    6