Workshop on command line tools - day 2
-
Upload
leandro-lima -
Category
Software
-
view
220 -
download
4
Transcript of Workshop on command line tools - day 2
I Workshop on command-line tools
(day 2)
Center for Applied GenomicsChildren's Hospital of Philadelphia
February 12-13, 2015
awk - a powerful way to check conditions and show specific columnsExample: show only CNV that use less than 3 targets (exons)tail -n +2 DATA.xcnv | awk '$8 <= 3'
awk - different ways to do the same thingtail -n +2 DATA.xcnv | awk '$8 <= 3'
# same effect 1
tail -n +2 DATA.xcnv | awk '$8 <= 3 {print}'
# same effect 2
tail -n +2 DATA.xcnv | awk 'if ($8 <= 3) {print}'
# same effect 3
tail -n +2 DATA.xcnv | awk 'if ($8 <= 3) {print $0}'
# different effect
tail -n +2 DATA.xcnv | awk 'if ($8 <= 3) {print $1}'
awk - more options on if statement# Applying XHMM "gold" thresholds (KB >= 1,
# NUM_TARG >= 3, Q_SOME >= 65, Q_NON_DIPLOID >= 65)
tail -n +2 DATA.xcnv | \
awk '$4 >= 1 && $8 >= 3 && $10 >= 65 && $11 >= 65' \
> DATA.gold.xcnv
# Using only awk
awk 'NR > 1 && $4 >= 1 && $8 >= 3 &&
$10 >= 65 && $11 >= 65' DATA.xcnv > DATA.gold2.xcnv
diff - compare files line by line
# Comparediff DATA.gold.xcnv DATA.gold2.xcnv
# Tip: install tkdiff to use a# graphic version of diff
Exercises1. Using adhd.map, show 10 SNPs with rsID starting with 'rs' on
chrom. 2, between positions 1Mb and 2Mb2. Check which chromosome has more SNPs3. Check which snp IDs are duplicated
Suggestions# 1.
grep '\brs' adhd.map | \
awk '$1 == 2 && int($4) >= 1000000 && int($4) <= 2000000' | \
less
# 2.
cut -f1 adhd.map | sort | uniq -c | sort -k1n | tail -1
# 3.
cut -f2 adhd.map | sort | uniq -c | awk '$1 > 1'
More awk - inserting external variablesawk -v Mb=1000000 -v chrom=2 \
'$1 == chrom && int($4) >= Mb && int($4) <= 2*Mb' \
adhd.map | less
# Printing specific columns
awk -v Mb=1000000 -v chrom=2 \
'$1 == chrom && int($4) >= Mb && int($4) <= 2*Mb
{print $1" "$2" "$4}' \
adhd.map | less
Using awk to check number of variantsin ped files# Options using only awk, but takes (much) more time
awk 'NR == 1 {print (NF-6)/2}' adhd.ped
awk 'NR < 2 {print (NF-6)/2}' adhd.ped # Slow, too
# Better alternative
head -n 1 adhd.ped | awk '{print (NF-6)/2}'
# Now, the map file
wc -l adhd.map
time - time command execution
time head -n 1 adhd.ped | awk '{print (NF-6)/2}'real 0m0.485suser 0m0.391ssys 0m0.064s
time awk 'NR < 2 {print (NF-6)/2}' adhd.ped
# Forget… just press Ctrl+Creal 1m0.611suser 0m51.261ssys 0m0.826s
top - display and update sorted information about processes / display Linux taks
top
z : colork : kill processu : choose specific userc : show complete commands running1 : show usage of singles CPUsq : quit
screen - screen manager with terminal emulation (i)
screenscreen -S <session_name>Ctrl+a, then c: create windowCtrl+a, then n: go to next windowCtrl+a, then p: go to previous windowCtrl+a, then 0: go to window number 0Ctrl+a, then z: leave your session, but keep running
screen - screen manager with terminal emulation (ii)
Ctrl+a, then [ : activate copy mode (to scroll screen) q : quit copy modeexit : close current windowscreen -r : resume the only session detachedscreen -r <session_name> : resume specific session detachedscreen -rD <session_name> : reattach session
split - split a file into piecessplit -l <lines_of_each_piece> <input> <prefix>
# Examplesplit -l 100000 adhd.map map_
wc -l map_*
in-line Perl/sed to find and replace (i)head DATA.gold.xcnv | cut -f3 | perl -pe 's/chr/CHR/g'
head DATA.gold.xcnv | cut -f3 | perl -pe 's/chr//g'
# Other possibilities
head DATA.gold.xcnv | cut -f3 | perl -pe 's|chr||g'
head DATA.gold.xcnv | cut -f3 | perl -pe 's!chr!!g'
head DATA.gold.xcnv | cut -f3 | sed 's/chr//g'
# Creating a BED file
head DATA.gold.xcnv | cut -f3 | perl -pe 's/[:-]/\t/g'
in-line Perl/sed to find and replace (ii)# "s" means substitute
# "g" means global (replace all matches, not only first)
# See the difference...
head DATA.gold.xcnv | cut -f3 | sed 's/9/nine/g'
head DATA.gold.xcnv | cut -f3 | sed 's/9/nine/'
# Adding more replacements
head DATA.gold.xcnv | cut -f3 | sed 's/1/one/g; s/2/two/g'
copy from terminal to clipboard/paste from clipboard to terminal
# This is like Ctrl+V in your terminal
pbpaste
# This is like Ctrl+C from your terminal
head DATA.xcnv | pbcopy
# Then, Ctrl+V in other text editor
# On Linux, you can install "xclip"http://sourceforge.net/projects/xclip/
datamash - command-line calculations
tail -n +2 DATA.xcnv | \ head | \ cut -f6,10,11 | \ datamash mean 1 sum 2 min 3 # mean of 1st column # sum of 2nd column # minimum of 3rd column
http://www.gnu.org/software/datamash/
touch - change file access and modification times
ls -lh DATA.gold.xcnvtouch DATA.gold.xcnvls -lh DATA.gold.xcnv
Introduction to "for" looptail -n +2 DATA.xcnv | cut -f1 | sort | uniq | head > samples.txt
for sample in `cat samples.txt`; do touch $sample.txt; done
ls -lh Sample*
for sample in `cat samples.txt`; do
mv $sample.txt $sample.csv;
done
ls -lh Sample*
Variables (i)
i=1name=Leandrocount=`wc -l adhd.map`echo $iecho $nameecho $count
Variables (ii)
# Examplesbwa=/home/users/llima/tools/bwahg19=/references/hg19.fasta
# Do not run$bwa index $hg19
System variablesecho $HOMEecho $USERecho $PWD
# directory where bash looks for your programsecho $PATH
Exercise
1. Create a program that shows input parameters/arguments
2. Create a program (say, "fields", or "colnames") that prints the column names of a <tab>-delimited file (example: DATA.xcnv)
3. Send this program to your PATH
Running a bash script (i)cat > arguments.shecho Your program is $0echo Your first argument is $1echo Your second argument is $2
echo You entered $# parameters.# Ctrl+C to exit "cat"
Running a bash script (ii)bash arguments.shbash arguments.sh A B C D E
ls -lh arguments.sh
-rw-r--r--
# First characterb Block special file.c Character special file.d Directory.l Symbolic link.s Socket link.p FIFO.- Regular file.
chmod - set permissions (i)
Next charactersuser, group, others | read, write, executels -lh arguments.sh-rw-r--r--
# Everybody can read# Only user can write/modify
chmod - set permissions (ii)
# Add writing permission to groupchmod g+w arguments.sh ls -lh arguments.sh# Remove writing permission from groupchmod g-w arguments.shls -lh arguments.sh# Add execution permission to allchmod a+x arguments.shls -lh arguments.sh
chmod - set permissions (iii)
# Add writing permission to group./arguments.sh ./arguments.sh A B C D E# change the namemv arguments.sh arguments# Send to your PATH (showing on Mac)sudo cp arguments /usr/local/bin/# Go to other directory# Type argu<Tab>, and "which arguments"
Run your program again