Linux
Running commands in parallel
Section titled “Running commands in parallel”Useful external tools
Section titled “Useful external tools”- ripgrep
- fd
- zet
- rush
- xan
The Linux shell
Section titled “The Linux shell”Piping and redirecting
Section titled “Piping and redirecting”Piping and redirecting are fundamental concepts in Linux that allow you to control how data flows between commands and files. They make the command line incredibly powerful by enabling you to chain commands together and manage input/output streams.
Standard streams
Section titled “Standard streams”stdin
: Data flows into a program.stdout
: Data flows out of a program.stderr
: Error messages flow out of a program separately.
graph LR INPUT["Data sources"] subgraph PROGRAM["Process"] STDIN["stdin"] --> PROC["🖥️"] PROC --> STDOUT["stdout"] PROC --> STDERR["stderr"] end OUTPUT["Data destinations"] INPUT --> STDIN STDOUT --> OUTPUT STDERR --> OUTPUT
Piping
Section titled “Piping”The pipe operator |
connects the stdout of one command to the stdin of another command.
graph LR CMD1("Command 1") CMD2("Command 2") TERM("Terminal") CMD1 -->|"stdout"| PIPE("|") PIPE -->|"stdin"| CMD2 CMD1 -->|"stderr"| TERM
awk '/^>/' genome.fasta | wc -l
# ls writes to stdout → grep reads from stdinls -la | grep ".txt"
# cat writes file content to stdout → wc reads from stdincat file.txt | wc -l
# Multiple pipes create a pipelineps aux | grep firefox | awk '{print $2}'# ↑ ↑# stdout→stdin stdout→stdin
Redirecting
Section titled “Redirecting”Operator | Description | Example |
---|---|---|
> | Redirect output to file (overwrite) | echo "Hello" > file.txt |
>> | Redirect output to file (append) | echo "World" >> file.txt |
2> | Redirect errors to file | command 2> errors.log |
&> | Redirect both output and errors | command &> all.log |
2>&1 | Merge error stream (stderr → stdout) | command > /dev/null 2>&1 |
< | Read input from file | sort < names.txt |
The sponge
command
Section titled “The sponge command”When you redirect output back to the same file you’re reading from, the operation fails in an unexpected way. Consider this example where you want to sort sequences in a FASTA file by length using SeqKit and save the result back to the same file:
seqkit sort -l sequences.fasta > sequences.fasta
This command empties file.txt
completely. The >
operator clears the file to prepare for writing before sort
gets a chance to read it, resulting in total data loss.
The sponge
command solves this problem by reading everything first, then writing to the file:
seqkit sort -l sequences.fasta | sponge sequences.fasta
Since sponge
isn’t included in standard Linux installations, you’ll need to install it through the moreutils
package. Using Pixi:
pixi global install moreutils
The moreutils
package includes sponge
along with several other handy command-line tools.
/dev/stdin
and /dev/stdout
Section titled “/dev/stdin and /dev/stdout”/dev/stdin
and /dev/stdout
are special file paths that map directly to a process’s standard input and output streams. They allow you to treat these streams like regular files. These pseudo-files are particularly useful when working with programs that don’t support piping directly. Instead of creating temporary files, you can:
- Pass
/dev/stdin
as a filename argument to programs that expect to read from a file, allowing them to read from piped input instead. - Use
/dev/stdout
as a filename argument when programs expect to write to a file, redirecting their output to the terminal or another piped command.
One use case is MUSCLE, a tool for multiple sequence alignment that only accepts input and output as files. To make MUSCLE read from stdin
and display the output in the terminal (stdout
), you can run it like this:
echo -e ">seq1\nMYYGR\n>seq2\nMRYR" | muscle -align /dev/stdin -output /dev/stdout
Common combinations
Section titled “Common combinations”cat file.txt | sort | uniq > sorted.txtcommand 2>&1 | tee output.logfind . -name "*.log" 2> /dev/null
Process substitution
Section titled “Process substitution”zet intersect <(echo -e "A\nB\nC") <(echo -e "B\nC\nD")