Oh, good point about the no-chunk-listing option! I tried that, and thi just keeps getting weirder. The below is done with the new 1.4.0 release.
First, I ran a backup with -vv. It yielded no mention of the 89eb chunk in the log, but after backing up, check failed for the backed up machine:
Chunk 89eb8415019cda9a27d5d95eeb06efcb7297ec4e404698223866f39a8e1f89c6 referenced by snapshot Dummy@stronger at revision 1 does not exist
Some chunks referenced by snapshot Dummy@stronger at revision 1 are missing
So, that’s as expected. I removed the snapshot, and tried again with no-chunk-listing. After this, check succeeded! But it still failed for other VMs referring to the same chunk:
All chunks referenced by snapshot Dummy@stronger at revision 1 exist
Chunk 89eb8415019cda9a27d5d95eeb06efcb7297ec4e404698223866f39a8e1f89c6 referenced by snapshot Web@stronger at revision 47 does not exist
Some chunks referenced by snapshot Web@stronger at revision 47 are missing
The same check succeeded for the Dummy machine. I guess the chunk wasn’t needed for the machine this time around? That’s very weird, since the machine is not powered on, and the backups were done within an hour of each other.
I therefore tried doing a backup of the Web VM using no-chunk-listing because maybe that would at least make a recent backup without issues or even at best upload the chunk. Alas, after this backup, now the same chunk is missing in the two latest snapshots:
Chunk 89eb8415019cda9a27d5d95eeb06efcb7297ec4e404698223866f39a8e1f89c6 referenced by snapshot Web@stronger at revision 47 does not exist
Some chunks referenced by snapshot Web@stronger at revision 47 are missing
Chunk 89eb8415019cda9a27d5d95eeb06efcb7297ec4e404698223866f39a8e1f89c6 referenced by snapshot Web@stronger at revision 48 does not exist
Some chunks referenced by snapshot Web@stronger at revision 48 are missing
There is still no mention of the chunk in the log:
[root@stronger:/vmfs/volumes/5d4298be-b92d7a2e-87d0-78acc0faa2a0/verticalbackup] grep 89 backup.log | grep eb84
2020-03-10 15:43:30.308189 ERROR CHECK_MISSING Backup Dummy@stronger revision 1: chunk 89eb8415019cda9a27d5d95eeb06efcb7297ec4e404698223866f39a8e1f89c6 missing
2020-03-10 16:45:36.858799 ERROR CHECK_MISSING Backup Web@stronger revision 47: chunk 89eb8415019cda9a27d5d95eeb06efcb7297ec4e404698223866f39a8e1f89c6 missing
(I do the grep in two pieces to also match chunk URLs); both mentions are from checks.
I decided maybe the upload was slightly non-deterministic, but very likely (since I’ve had it fail on that chunk fairly consistently), so I removed the snapshot and tried doing another backup of the Dummy VM. This time something interesting happened: the check now returns error on 3 entirely new chunks:
Chunk 24d8723e78a96e20fd6d50b283ebce79a10b487ad92e089bb249b587f4e7440b referenced by snapshot Dummy@stronger at revision 1 does not exist
Chunk 2befc09bd32f28915e5376441781661c59821eb95407ebf53889dd28afc6c3c7 referenced by snapshot Dummy@stronger at revision 1 does not exist
Chunk 961086c1058d165a1b21ab55e97c528858bb0af6492a5e9e603ec1fa741726d4 referenced by snapshot Dummy@stronger at revision 1 does not exist
Some chunks referenced by snapshot Dummy@stronger at revision 1 are missing
I manually checked the 2fef chunk, and can indeed not see it in B2. I checked the log, and the missing pieces are mentioned in the log; it looks perfectly normal:
[root@stronger:/vmfs/volumes/5d4298be-b92d7a2e-87d0-78acc0faa2a0/verticalbackup] grep d8723e78a96e20f backup.log
2020-03-10 16:07:16.438307 DEBUG B2_CALL URL request https://f001.backblazeb2.com/file/stronger/chunks/24/d8723e78a96e20fd6d50b283ebce79a10b487ad92e089bb249b587f4e7440b returned status code 404
2020-03-10 16:07:17.736062 TRACE CHUNK_UPLOAD Uploaded chunk 24d8723e78a96e20fd6d50b283ebce79a10b487ad92e089bb249b587f4e7440b size 104881
2020-03-10 16:07:17.764992 DEBUG CHUNK_CACHE Saved chunk 24d8723e78a96e20fd6d50b283ebce79a10b487ad92e089bb249b587f4e7440b to the local cache
[root@stronger:/vmfs/volumes/5d4298be-b92d7a2e-87d0-78acc0faa2a0/verticalbackup] grep efc09bd32f28915e53 backup.log
2020-03-10 16:07:12.806573 DEBUG B2_CALL URL request https://f001.backblazeb2.com/file/stronger/chunks/2b/efc09bd32f28915e5376441781661c59821eb95407ebf53889dd28afc6c3c7 returned status code 404
2020-03-10 16:07:13.210015 TRACE CHUNK_UPLOAD Uploaded chunk 2befc09bd32f28915e5376441781661c59821eb95407ebf53889dd28afc6c3c7 size 1472
2020-03-10 16:07:13.335349 DEBUG CHUNK_CACHE Saved chunk 2befc09bd32f28915e5376441781661c59821eb95407ebf53889dd28afc6c3c7 to the local cache
[root@stronger:/vmfs/volumes/5d4298be-b92d7a2e-87d0-78acc0faa2a0/verticalbackup] grep 1086c1058d165a1 backup.log
2020-03-10 16:07:14.066779 DEBUG B2_CALL URL request https://f001.backblazeb2.com/file/stronger/chunks/96/1086c1058d165a1b21ab55e97c528858bb0af6492a5e9e603ec1fa741726d4 returned status code 404
2020-03-10 16:07:15.765799 TRACE CHUNK_UPLOAD Uploaded chunk 961086c1058d165a1b21ab55e97c528858bb0af6492a5e9e603ec1fa741726d4 size 702831
2020-03-10 16:07:15.820098 DEBUG CHUNK_CACHE Saved chunk 961086c1058d165a1b21ab55e97c528858bb0af6492a5e9e603ec1fa741726d4 to the local cache
I looked at the log and these three chunks are the last parts uploaded before backup finishes:
2020-03-10 16:07:08.137757 TRACE CHUNK_DEDUP Skipped duplicate chunk e41cafdd950226ff92eb1745b9f16efacabf00cafc445d6728325d8766bf687f in the cache (10478/10486)
2020-03-10 16:07:08.243092 TRACE CHUNK_DEDUP Skipped duplicate chunk e41cafdd950226ff92eb1745b9f16efacabf00cafc445d6728325d8766bf687f in the cache (10483/10486)
2020-03-10 16:07:08.454590 TRACE CHUNK_DEDUP Skipped duplicate chunk e41cafdd950226ff92eb1745b9f16efacabf00cafc445d6728325d8766bf687f in the cache (10484/10486)
2020-03-10 16:07:09.437713 TRACE CHUNK_DEDUP Skipped duplicate chunk 42096f4749783683aa92e791f920d16325de4829ea197c93467bb446f1aafc21 in the storage (10485/10486)
2020-03-10 16:07:10.352848 INFO BACKUP_UPLOAD Uploaded file CentOS 7 Template-flat.vmdk 252.43MB/s 00:17:18
2020-03-10 16:07:10.603772 DEBUG B2_CALL URL request https://f001.backblazeb2.com/file/stronger/chunks/59/0ba3f01989ddf112ec6a1fefba7a795934146fa53ce8aff2dec963b8e15a63 returned status code 404
2020-03-10 16:07:12.142725 TRACE CHUNK_UPLOAD Uploaded chunk 590ba3f01989ddf112ec6a1fefba7a795934146fa53ce8aff2dec963b8e15a63 size 3819
2020-03-10 16:07:12.143151 INFO BACKUP_UPLOAD Uploaded file CentOS 7 Template.vmx
2020-03-10 16:07:12.393295 TRACE CHUNK_DEDUP Skipped duplicate chunk 93f1159ad032dd8b059fa22d9494594ce77da21ec179e41d2a955307ad6b6521 in the storage
2020-03-10 16:07:12.393617 INFO BACKUP_UPLOAD Uploaded file CentOS 7 Template.vmxf
2020-03-10 16:07:12.642063 TRACE CHUNK_DEDUP Skipped duplicate chunk 5d059c1a461b8b6058366deb1118e69065a5660042603ce06efdf4a8b449c9a5 in the storage
2020-03-10 16:07:12.642537 INFO BACKUP_UPLOAD Uploaded file CentOS 7 Template.nvram
2020-03-10 16:07:12.806573 DEBUG B2_CALL URL request https://f001.backblazeb2.com/file/stronger/chunks/2b/efc09bd32f28915e5376441781661c59821eb95407ebf53889dd28afc6c3c7 returned status code 404
2020-03-10 16:07:13.210015 TRACE CHUNK_UPLOAD Uploaded chunk 2befc09bd32f28915e5376441781661c59821eb95407ebf53889dd28afc6c3c7 size 1472
2020-03-10 16:07:13.335349 DEBUG CHUNK_CACHE Saved chunk 2befc09bd32f28915e5376441781661c59821eb95407ebf53889dd28afc6c3c7 to the local cache
2020-03-10 16:07:14.066779 DEBUG B2_CALL URL request https://f001.backblazeb2.com/file/stronger/chunks/96/1086c1058d165a1b21ab55e97c528858bb0af6492a5e9e603ec1fa741726d4 returned status code 404
2020-03-10 16:07:15.765799 TRACE CHUNK_UPLOAD Uploaded chunk 961086c1058d165a1b21ab55e97c528858bb0af6492a5e9e603ec1fa741726d4 size 702831
2020-03-10 16:07:15.820098 DEBUG CHUNK_CACHE Saved chunk 961086c1058d165a1b21ab55e97c528858bb0af6492a5e9e603ec1fa741726d4 to the local cache
2020-03-10 16:07:16.438307 DEBUG B2_CALL URL request https://f001.backblazeb2.com/file/stronger/chunks/24/d8723e78a96e20fd6d50b283ebce79a10b487ad92e089bb249b587f4e7440b returned status code 404
2020-03-10 16:07:17.736062 TRACE CHUNK_UPLOAD Uploaded chunk 24d8723e78a96e20fd6d50b283ebce79a10b487ad92e089bb249b587f4e7440b size 104881
2020-03-10 16:07:17.764992 DEBUG CHUNK_CACHE Saved chunk 24d8723e78a96e20fd6d50b283ebce79a10b487ad92e089bb249b587f4e7440b to the local cache
2020-03-10 16:07:19.542225 INFO BACKUP_DONE Backup Dummy@stronger at revision 1 has been successfully completed
2020-03-10 16:07:19.542521 INFO BACKUP_STATS Total 10493 chunks, 262144.78M bytes; 82 new, 1925.78M bytes, 924.75M uploaded
2020-03-10 16:07:19.542681 INFO BACKUP_TIME Total backup time: 00:17:29
2020-03-10 16:07:19.554111 TRACE CACHE_CLEAN Cleaning local cache
2020-03-10 16:07:19.633480 DEBUG CACHE_LIST Listing chunks referenced in backup Naming@stronger at revision 51 in the local cache
2020-03-10 16:07:19.676861 DEBUG CACHE_LIST Listing chunks referenced in backup DB@stronger at revision 65 in the local cache
2020-03-10 16:07:19.721371 DEBUG CACHE_LIST Listing chunks referenced in backup Web@stronger at revision 48 in the local cache
I notice that the three failed chunks do not give a chunk number after the size, unlike other chunks earlier:
2020-03-10 16:06:50.639517 TRACE CHUNK_DEDUP Skipped duplicate chunk e41cafdd950226ff92eb1745b9f16efacabf00cafc445d6728325d8766bf687f in the cache (10184/10486)
2020-03-10 16:06:50.739613 TRACE CHUNK_DEDUP Skipped duplicate chunk e41cafdd950226ff92eb1745b9f16efacabf00cafc445d6728325d8766bf687f in the cache (10185/10486)
2020-03-10 16:06:50.755938 TRACE CHUNK_UPLOAD Uploaded chunk fe7c6f85ab0bc54af135230c073f6c21807e173c12d4d7276db5cd1e14f3972c size 26214400 (10005/10486)
2020-03-10 16:06:50.767888 INFO UPLOAD_PROGRESS Uploading 249.92MB/s 00:00:29 97.1%
2020-03-10 16:06:50.937367 TRACE CHUNK_DEDUP Skipped duplicate chunk e41cafdd950226ff92eb1745b9f16efacabf00cafc445d6728325d8766bf687f in the cache (10186/10486)
2020-03-10 16:06:51.053563 TRACE CHUNK_DEDUP Skipped duplicate chunk e41cafdd950226ff92eb1745b9f16efacabf00cafc445d6728325d8766bf687f in the cache (10187/10486)
I also notice that the sizes are different and the chunks are decleared done immediately (while there’s around 6 seconds between chunks surrounding 10005 and 10005 itself):
2020-03-10 16:06:31.891889 TRACE CHUNK_DEDUP Skipped duplicate chunk b49e888638cab1e9593f12129302fed612f368b25faa9be36b1c71cf9513bbdd in the storage (10003/10486)
2020-03-10 16:06:32.203188 TRACE CHUNK_DEDUP Skipped duplicate chunk 1ac667bf0253db09d14ae062f527a0f06db5d54ce63e48c5c0cd3c0272ff6a9a in the storage (10004/10486)
2020-03-10 16:06:32.659249 DEBUG B2_CALL URL request https://f001.backblazeb2.com/file/stronger/chunks/fe/7c6f85ab0bc54af135230c073f6c21807e173c12d4d7276db5cd1e14f3972c returned status code 404
2020-03-10 16:06:34.577721 TRACE CHUNK_UPLOAD Uploaded chunk 4007b4648395b5444c2e0ae48e33d8d2c2847807e09bf85c703bd890e6559590 size 26214400 (9982/10486)
2020-03-10 16:06:34.588461 INFO UPLOAD_PROGRESS Uploading 249.39MB/s 00:00:48 95.4%
2020-03-10 16:06:35.102507 TRACE CHUNK_DEDUP Skipped duplicate chunk fb3f6d3f3fd85df867ec3c87c0ee45f59c6a3b78dcb1d71cc122ebadd1893a1a in the storage (10006/10486)
2020-03-10 16:06:35.464666 TRACE CHUNK_DEDUP Skipped duplicate chunk e41cafdd950226ff92eb1745b9f16efacabf00cafc445d6728325d8766bf687f in the cache (10007/10486)
2020-03-10 16:06:35.940118 TRACE CHUNK_DEDUP Skipped duplicate chunk 92ee23e40a5b6ddbb0f267731554db5b153fb6e5a227929789851822fe8e7d05 in the storage (10008/10486)
This last bit might give some hint of why the 89eb chunk went missing in the first place. I do backups with 4 threads, and maybe the 3 threads busy uploading get terminated too soon, so B2 decides to discard the blocks, yet they are still stored in the local cache?
I’ll try removing the local cache and do a backup of Web without multiple threads and with no-chunk-listing and see if that resolves anything.