one-liner | Assist.Prof.Dr. Alper YILMAZ

Transpose a matrix with perl nested map

Here’s the matrix that we’ll be using: $ paste <(seq 1 5) <(seq 12 16) 1 12 2 13 3 14 4 15 5 16 Now, let’s use a perl one-liner with nested maps to transpose the matrix: $ paste <(seq 1 5) <(seq 12 16) | perl -ane 'push @matrix,[@F]; END { print join "\n",map {$row=$_; join"\t",map { $matrix[$_][$row]} 0 .. $#matrix } 0 .. $#{$matrix[0]}; print "\n" }' 1 2 3 4 5 12 13 14 15 16 I got the idea from this blog post, but I slightly modified it so that you don’t need to make a copy of the transposed array (to save memory)

Extract intervals from an array of numbers

Let’s assume you have an array of numbers and you want to extract intervals from this array. For example, from such an array: 2,3,4,5,8,9,10,11,12,15,18,19,20 you should be getting (2-5), (8-12), (18-20) as intervals. More bioinformatic case: Let’s assume you ran samtools pileup format and want to extract intervals from the genomic coordinates that has at least one hit.

Extract upstream region sequence with bedtools

Soon after SAM/BAM format became standard for short-read alignment softwares, high caliber tools have been emerging that can process the widely accepted format. bedtools is one of them and it’s easy to use and flexible. Most importantly you can integrate it with commandline pipes. In this post, I’ll be describing how to extract upstream region sequences with the help of bedtools. I’ll be using the following files in my sample:

Plot one-liner generated data with gnuplot

In this post, I’ll demonstrate how to use gnuplot in a one-liner. We’ll use the pipe but unfortunately you cannot pipe raw data to gnuplot directly (as far as I know). The piped data should contain basic gnuplot commands on top. So, we’ll use the following template: very-complicated-data-generating-commands | sed -e "1i\plot '-' " | gnuplot -persist If you’re interested in quickly see how this works, try something simple:

perl one-liner to pick random sequences from fasta file

In an earlier post we learned how to use Bio::SeqIO module to process fasta files with one-liner. Let’s do more with this capability. What about selecting random sequences from a fasta file? To achieve that, we’ll load the fasta file contents into a hash and then utilize the fact that rand(@array) returns index of a random element from that array. Let’s pick 100 random sequences from a fasta file with one-liner:

Way more practical one-liners with perl5i

perl5i project explains itself as “Perl 5 has a lot of warts, fix as much of it as possible in one pragma”. You can run your scripts with it by including perl5i (ie, use perl5i;). Best part is, it can be run at commandline with $ perl5i -e . perl5i includes Autobox module which lets you call methods on primitive datatypes such as scalars and arrays (eg. “hello world”->print). This feature allows constructing very compact one-liners as shown below:

Most used commands in history

Most of the “most used commands” approaches does not consider pipes and other complexities. This approach considers pipes, process substitution by backticks or $() and multiple commands separated by ; Perl regular expression breaks up each line using | or <( or ; or ` or $( and picks the first word (excluding “do” in case of for loops) history | perl -F"\||<\(|;|\`|\\$\(" -alne 'foreach (@F) { print $1 if /\b((?

One line statistics

Let’s assume we have a file with five columns where first column is text and rest of the columns are numeric. How can we calculate the standard deviation (or other statistical functions) with a perl one-liner? We’ll use Statistics::Descriptive module. perl -MStatistics::Descriptive -ane 'BEGIN{our $stat = Statistics::Descriptive::Full->new}; $stat->add_data(@F[1..4]); print $stat->standard_deviation,"\n"; $stat->clear' filename $stat->clear at the end was needed since data is added not assigned to $stat each time, so in order to prevent cumulative calculation, $stat variable should be cleared each time.

perl one-liner to process sequence files in stream

Need a practical way to process fasta files with Bio::SeqIO module ? Below code will print sequence id and sequence length with tab per line. perl -MBio::SeqIO -e '$seq=Bio::SeqIO->new(-fh => \*STDIN);while ($myseq=$seq->next_seq){print $myseq->id,"\t",$myseq->length,"\n";}' < filename OR cat filename | perl -MBio::SeqIO -e '$seq=Bio::SeqIO->new(-fh => \*STDIN);while ($myseq=$seq->next_seq){print $myseq->id,"\t",$myseq->length,"\n";}' There are many more methods to use from Bio::Seq, such as revcom, translate, subseq(start,end), primary_id, desc, etc. Piped file does not need to be in Fasta format, there are many other formats (listed here) which SeqIO can parse successfully.

Top ten occurrences with perl one-liner

Very nice perl one-liner using map, sort and array range to show top ten occurrences Taken from Tech@Sakana blog perl -ane '$c{$F[0]}++; END {print map {$_ . "\t->\t" . $c{$_} . "\n"} (sort {$c{$b} <=> $c{$a}} keys %c)[0..9]}' filename Same thing can be achieved by: sort filename | uniq -c | sort -nr | head But the perl one-liner demonstrates the nice combination of sort and map.