4

[Note: question heavily edited to correspond to the actual problem]

I'm trying to debug a command that fails only in specific conditions. The failure is with an exitcode 140, but I have no other information.

This command is cat in_file | tr "\t" "\n" > out_file, and it is part of a Nextflow script, which is in turn run on a cluster with the SLURM scheduler.

As the command is tr, part of the GNU coreutils, I checked the manpage and info, that only mention that "An exit status of zero indicates success, and a nonzero value indicates failure.", but this is not a standard error from errno.h codes.

Alexlok
  • 143
  • 1
  • 6

2 Answers2

2

The error code you see won't be from tr but either from the nextflow or SLURM. A quick search suggests that it is indeed SLURM. See, for example, here:

  • A job/ process is not given enough memory or time: pipeline runs on large samples or datasets may require more memory or a higher time limit. When reported correctly, the pipeline will indicate an error status of 140 (for SGE or SLURM environments); however, memory issues can take many forms, and related error messages are not always clear. In this example case, the process PairedEndHisat failed due to insufficient memory, but indicated a general error status (1):

Or, here:

140 The job exceeded the “wall clock” time limit (as opposed to the CPU time limit).

I don't know much about this so I don't know who's right. However, this at least gives you the general direction: is the process taking a long time? Or, since you are trying to write to a file, do you have enough disk space to actually write?

terdon
  • 54,564
0

I also encountered the exit code 140 problem with Nextflow on a SLURM backend. In my case, it was not related to memory or the number of CPUs. The issue was resolved by minimizing the priority of the program being executed, although the exact cause remains unknown.

To adapt the command initially presented, I ran it with a lowered priority as follows:

cat in_file | nice -n 20 tr "\t" "\n" > out_file

I acknowledge that this is not a definitive fix. It simply resolved the issue for me after a long day of troubleshooting, and I hope this post can save someone else time. Interestingly, a regular nice without arguments, which defaults to -n 10, was not sufficient. This may suggest a race condition, possibly triggered by a busy network and delays in the file system.

smoe
  • 101