Backup job (cron) never finished, now stuck


#1

I have Vertical backing up a couple of VMs for a few weeks now and been pretty reliable. About 4 days ago, there was some hiccup and I guess the backup never finished.

But- there seems to be no timeout, and now the daily cronjob is failing, saying “BACKUP_BUSY Another backup job is in progress”

I found these instructions for forcefully killing the backup (seems a bit hazardous- will this corrupt anything?) :

But- I’m hoping there is a more safe/sane way to keep this from happening. Paid user here, thank you


#2

I think it is most likely that the previous backup was aborted but left the plock file behind. You can find this plock file in the hidden .verticalbackup folder and simply remove it.

Even if you run pkill vertical to kill an ongoing backup it won’t corrupt anything.


#3

It wasn’t the plock file in this case, there were 7 instances of vertical running, I had to manually kill them and then the backup was able to run again. Hopefully it was a random fluke. A timeout feature would be very nice to have - if backup is running for >24 hrs then fail, etc.


#4

@gchen I’ve bought 2 more licenses and set up new hosts. Still having this problem quite often. It’s a real PITA. Any more thoughts about a --timeout option?


#5

Did it happen to all three hosts? When it gets stuck again, run pgrep vertical to find the pid of the first Vertical Backup process, and then strace -fp pid to trace the system calls. The output of strace should indicate in which call it gets stuck.


#6

It’s happened to 2 of the hosts so far. Both using B2 backend. I killed it already to get it moving again, but if it happens again I’ll run the ptrace. Thank you


#7

@gchen I had a backup get stuck again (ran >13hrs) and I ran the ptrace commands. The output is large, is there a way I can send it to you via private message or email?


#8

This 1.3.6 version should fix the bug: https://acrosync.com/esxi/vertical

You’ll need to download the binary on a desktop computer and scp to the ESXi host since wget there doesn’t accept the https protocol.


#9

Thank you. I have v1.3.6 installed now on 3 hosts, fingers crossed. Will keep posted here on status.


#10

@gchen Got my first successful cron backup from the troublesome host last night! So far so good. Thank you again :slight_smile: