Post on 20-Dec-2015
1
Graduate Student Survival Guide:Graduate Student Survival Guide:using cluster, gnuplot and LaTeXusing cluster, gnuplot and LaTeX
Janardhan Rao DoppaSchool of EECS, Oregon State University
doppa@eecs.oregonstate.eduhttp://web.engr.oregonstate.edu/~doppa
2
EECS Cluster: what ? • A computing resource to run your jobs
• Off-shore your computing
• Experiments or simulations for research
• Will be handy when you have to run large number of experiments
• You don’t want to use your DELL (read as delicate) laptop
• Web• http://engr.oregonstate.edu/computing/cluster/
•
3
EECS Cluster: how ? • Connection: Connect to one of the “submit” Hosts
Submit32 or submit64
• Availability: Check the availability of slots in each queue
I386, em64t, amd64-low, eecs1
• Compile: Compile your code on the remote machine
• Script: Prepare the “submit script”
command to run your program, which queue, where to store the output or error
• Submit: Submit the job using “submit script”
• Monitor: Monitor the status
auto- email or manually check the status
4
EECS Cluster: how ? • Connection: Connect to one of the “submit” Hosts
ssh <user> @ {submit32, submit64}.eecs.oregonstate.edu
• Availability: Check the availability of slots in each queue
• qstat command : learn the usage “qstat - - help”
• “qstat –f –q <queue>” where <queue> = i386 or em64t or amd64-low or eecs1
em64t@exec-em64t-01.hpc.engr.o BIP 2/2 2.02 lx24-amd64 1402020 0.50500 run09_26.s matthchr r 10/28/2010 20:05:08 1 1402032 0.50500 closfc mathewm r 10/28/2010 21:03:08 1
#occupied / # total
5
EECS Cluster: how ? • Script: Prepare the “submit script”
#!/bin/csh
#Job name#$ -N job_name
#Current Working Directory#$ -cwd
# Resource request for the faster bees#$ -soft -l mem_total=3.00G
# specify the hardware platform to run the job on.# options are: amd64, em64t, i386, volumejob (use em64t if you don't care)#$ -q i386
# Output/error file (merged)#$ -o output_file.out#$ -j y
# Command sequence./source_file
6
EECS Cluster: how ? • Submit: Submit the job using “submit script”
Change permissions of script: “chmod u+x script.csh”
“qsub script.csh”
• Monitor: Monitor the status “qstat –u <user>”
Cautions: You should have enough disk space (logs and outputs) and
main memory (RAM) to run the program
Don’t monopolize the cluster – think of others also!
Budgeted experimental design – based on the available resources (slots), hard deadlines (time) etc.
7
gnuplot: what ? • A command-line program to generate 2D and 3D plots
better than Excel – no more frustrating clicks!
specify style, fonts, legends as commands
reuse the code for modifications or similar plots
generates very good PS or EPS figures which are highly compatible with LaTeX
“gnu” is not the same as “GNU”!!
• Web
http://www.gnuplot.info/ Available for both linux and windows
8
gnuplot: how ? • Data file: Create data file to be used for the plot
Space separated column-wise data
• Code file: Create the gnuplot code file
Specify the title of plot, axes names and ranges, legends, thickness of lines, color etc.
Specify the output format (PNG, PS or EPS), along with the filename
• Run: run your code on the gnuplot command-line Copy and paste your code on the command-line and press
ENTER
9
gnuplot: how ? • Data file: Create data file to be used for the plot
Space separated column-wise data
0.1 100 73.13 70.140.2 100 70.14 73.130.3 100 70.14 73.130.4 100 74.62 73.130.5 100 74.62 73.130.6 84 64.17 70.89
10
gnuplot: how ? • Code file: Create the gnuplot code file
set terminal postscript eps enhanced "Helvetica" 18set term postscript eps colorset key graph 0.75,0.9set size 0.9, 0.9set title "Bayes-EM vs Ripper on NFL data \n (Novelty missingness model)“set ylabel "Accuracy (%)“set xlabel "Percentage of missing values“set xrange [0.1:0.6]set yrange [50:100]set output 'EM_comparison_novelty.eps‘plot \'EM_comparison_novelty.txt' using 1:$2 t'Bayes-EM' with linespoint lt 2 lw 1 pt 7,\'EM_comparison_novelty.txt' using 1:$3 t'RIPPER-conservative' with linespoint lt 3 lw 3 pt 7,\'EM_comparison_novelty.txt' using 1:$4 t'RIPPER-aggressive' with linespoint lt 4 lw 3 pt 7
11
gnuplot: how ? • Run: run your code on the gnuplot command-line
Copy and paste your code on the command-line and press ENTER
12
gnuplot: resources
• Short and quick reference guide http://sparky.rice.edu/gnuplot.html
• Web resources http://www.gnuplot.info/
Demos, tutorials, sample codes and scripts
Lot of useful sample plots are available at: http://www.cse.iitb.ac.in/silmaril/br/lib/exe/fetch.php?id=students&cache=cache&media=students:gnuplot.tgz
Thanks to Bhaskaran Raman and Kameshwari Chebrolu.
13
LaTeX: what ? • A manuscript preparation system
better than Word – no more equation editors!
Math formulas and equations are easier to write
Bibliography and cross-referencing is much easy
Almost all conference and journal papers are written using LaTeX
Default standard in academia – get used to it!
• Web
http://en.wikibooks.org/wiki/LaTeX Windows editors: TeXnicCenter and WinEdit
Linux editors: Lyx and Kyle
14
LaTeX: basic files • LaTex code
.tex – LaTeX input code file
.sty – style file
• Bibliography .bib – bibliography file
.bst – bibliography style file
• Output .dvi – device independent file
.ps – postscript file
15
LaTeX: writing code file • Start with an existing template
• Basic commands \section, \subsection, \subsubsection
Text mode vs. Math mode ($ $)
Math symbols: \alpha, \beta, \gamma
\begin{environment} and \end{environment}• \begin{itemize} and \end{itemize}
• \begin{equation} and \end{equation}
• \begin{figure} and \end{figure}
• \begin{table} and \end{table}
16
LaTeX: bibliography file • A sample bibliography entry
@inproceedings{CRF-ICML:01, author = {John Lafferty and Andrew McCallum and Fernando Pereira}, title = {Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data}, booktitle = {ICML'01: Proceedings of the 18th International Conference on Machine Learning}, year = {2001},}
@article{TRITRAINING-TKDE:05, author = {Zhi-Hua Zhou and Ming Li}, title = {Tri-Training: Exploiting Unlabeled Data Using Three Classifiers}, journal = {IEEE Transactions on Knowledge and Data Engineering}, volume = {17}, issue = {11}, year = {2005},}
17
LaTeX: compiling • LaTeX code with “latex” or “pdflatex”
• BibTeX code with “bibtex” Latex <code>
Bibtex <bib>
Latex <code>
two pass algorithm!
• Collaborative writing Use CVS or SVN repository – much easier!
18
LaTeX: resources• LaTeX cheat sheet
• http://www.ctan.org/tex-archive/info/latexcheat/latexcheat/latexsheet.pdf
• LaTeX wiki book
• http://en.wikibooks.org/wiki/LaTeX/
• Learn tips and tricks• From expert users
• From online forums
• Grow your bag of tricks – will save your time at deadlines!
19
LaTeX in PowerPoint
• TeXPoint – A LaTeX add-on for ppt and word http://texpoint.necula.org/ http://web.engr.oregonstate.edu/~mehtane/
latex/index.html
• TeXclip – LaTeX to image http://maru.bonyari.jp/texclip/texclip.php
• Beamer slides using LaTeX http://bitbucket.org/rivanvx/beamer/wiki/Home
20
MS students: Advice • Hard to fund all the MS students
bad economy, low grant money etc.
Short time investment – faculty will chose their bets carefully!
• Look for alternative funding sources BSG, Media Services, Library, Science laboratories,
e.g., chemistry, biology etc.
• Bottom line: Grad school is costly, but a very good long term investment!!
21
MS students: Advice • Immediate reward vs. long-term average reward
• Worst: you finish your graduate school with your money
• Concentrate on your education and develop skills
• Go for a summer internship – money and experience
• Specialize in something – good job market!
• You can pay your loans in less than 6 months!!
• Don'ts • Finish classes quickly and graduate with ME – bad idea!
• worry about money while in school – won’t be productive