pgzip is a Go library that provides parallel gzip compression and decompression. It serves as a drop-in replacement for compress/gzip
, with significant performance improvements through concurrency. This makes it particularly useful for handling large amounts of data efficiently.
Why Use pgzip?
- Parallel processing: Splits compression into blocks processed in parallel.
- Standard gzip compatibility: Compressed files are fully compatible with any gzip reader.
- Efficient decompression: Supports readahead to minimize processing delays.
- Optimized for multicore CPUs: Maximizes compression and decompression speed.
- Automatic CRC handling: Integrity checks run in a separate goroutine.
Installation
To install the library, run:
go get github.com/klauspost/pgzip/...
Updating dependencies is also recommended:
go get -u github.com/klauspost/compress
Using pgzip in Go
Parallel Compression
pgzip works similarly to compress/gzip
, with additional options for block size and parallel processing.
package main
import (
"bytes"
"compress/gzip"
"os"
"github.com/klauspost/pgzip"
)
func main() {
var b bytes.Buffer
w := pgzip.NewWriter(&b)
// Set concurrency: 100 KB per block, up to 10 blocks processed in parallel
w.SetConcurrency(100000, 10)
w.Write([]byte("Hello, world\n"))
w.Close()
// Save compressed file
os.WriteFile("file.gz", b.Bytes(), 0644)
}
Key considerations:
SetConcurrency(blockSize, blocks)
: Defines block size and the number of concurrent blocks.- Recommended values:
- Minimum block size: 100 KB.
- Number of blocks: About twice the number of CPU cores.
For optimal performance, compress files larger than 1 MB.
Parallel Decompression
Decompression works like compress/gzip
but benefits from readahead to improve speed.
package main
import (
"os"
"github.com/klauspost/pgzip"
)
func main() {
f, err := os.Open("file.gz")
if err != nil {
panic(err)
}
defer f.Close()
r, err := pgzip.NewReader(f)
if err != nil {
panic(err)
}
defer r.Close()
content, err := os.ReadFile("file.gz")
if err != nil {
panic(err)
}
os.WriteFile("file.txt", content, 0644)
}
For finer control over concurrency, use:
r, err := pgzip.NewReaderN(f, blockSize, blocks)
where:
blockSize
defines the size of decompression blocks.blocks
specifies the maximum number of blocks decoded ahead of time.
Performance Comparison
Compression Performance (16-core CPU, GOMAXPROC=32)
Compressor | Speed (MB/s) | Speedup | Final Size | Size Overhead |
---|---|---|---|---|
gzip (Go standard) | 16.91 MB/s | 1.0x | 4.78 GB | 0% |
gzip (klauspost) | 127.10 MB/s | 7.52x | 4.88 GB | +2.17% |
pgzip (klauspost) | 2085.35 MB/s | 123.34x | 4.88 GB | +2.19% |
Observations:
- pgzip is 123 times faster than Go’s standard gzip implementation in this test.
- The file size increase is minimal (+2.19%) compared to gzip.
- pgzip includes a Huffman-only compression mode, reaching speeds of 450 MB/s per core.
Decompression Performance (4-core CPU)
Decompressor | Time | Speedup |
---|---|---|
gzip (Go standard) | 1m 28.85s | 1.0x |
pgzip (klauspost) | 43.48s | 2.04x |
Conclusion:
- pgzip is more than twice as fast as standard gzip.
- Readahead allows data to be processed faster without I/O blocking.
- The implementation acts as a buffer, preventing unnecessary delays.
Conclusion
pgzip is an excellent choice for Go applications that require efficient parallel compression and decompression, particularly in high-performance environments.
Key advantages:
- Compatible with standard gzip: No changes needed in tools or file formats.
- Optimized for large data volumes: Best for files larger than 1 MB.
- Multicore CPU support: Uses all available processing power.
- Huffman compression for extreme speed: Up to 450 MB/s per core.
- Outperforms gzip in Go: 123x faster in compression, 2x faster in decompression.
For any project requiring high-speed gzip operations in Go, pgzip is a top-tier solution.