Details
-
Suggestion
-
Resolution: Unresolved
-
P3: Somewhat important
-
None
-
None
-
None
Description
Coin currently uses parallel gzip for compression. Currently we have builds that produce even 100GB of artifacts, that take more than 5min to compress and timeout. As a result, we resort to not-so-nice hacks, like building the tests in the "Test" workitem, to avoid transfer of artifacts.
Zstd has several advantages, among which:
- Always very fast decompression
- Very configurable compression
- in the default setting -3 it is much faster than gzip and ratio marginally better.
- Has many useful tweaks for all speed/ratio needs.
- For example --long increases the compression ratio a lot, together with the memory requirements.
- Supports multiple threads natively (just pass -T0 to the command line).
- Auto-verifies integrity while decompressing, so no extra verification step is needed, like we currently do gzip -t.
- Binaries can be found for almost every OS.
I've discussed it extensively with tosaario and there are a couple of potential drawbacks:
- Integration in golang is dubious.
- However zstd comes as a mature gzip-like utility and very optimized C library. It might be worth piping the data from golang to the zstd utility while compressing/decompressing. Or just spawning a shell pipeline for the full job, for example curl http://... | zstd -d | tar -xf - for streaming-decompressing-extracting.
Attachments
Issue Links
- split from
-
QTQAINFRA-3100 How to make provisioning of tier2 VMs faster
- Closed