pgzip is a Go library that provides parallel gzip compression and decompression. It serves as a drop-in replacement for compress/gzip, with significant performance improvements through concurrency. This makes it particularly useful for handling large amounts of data efficiently.

Why Use pgzip?

Parallel processing: Splits compression into blocks processed in parallel.
Standard gzip compatibility: Compressed files are fully compatible with any gzip reader.
Efficient decompression: Supports readahead to minimize processing delays.
Optimized for multicore CPUs: Maximizes compression and decompression speed.
Automatic CRC handling: Integrity checks run in a separate goroutine.

Installation

To install the library, run:

go get github.com/klauspost/pgzip/...Code language: JavaScript (javascript)

Updating dependencies is also recommended:

go get -u github.com/klauspost/compressCode language: JavaScript (javascript)

Using pgzip in Go

Parallel Compression

pgzip works similarly to compress/gzip, with additional options for block size and parallel processing.

package main

import (
    "bytes"
    "compress/gzip"
    "os"
    "github.com/klauspost/pgzip"
)

func main() {
    var b bytes.Buffer
    w := pgzip.NewWriter(&b)

    // Set concurrency: 100 KB per block, up to 10 blocks processed in parallel
    w.SetConcurrency(100000, 10)

    w.Write([]byte("Hello, world\n"))
    w.Close()

    // Save compressed file
    os.WriteFile("file.gz", b.Bytes(), 0644)
}Code language: JavaScript (javascript)

Key considerations:

SetConcurrency(blockSize, blocks): Defines block size and the number of concurrent blocks.
Recommended values:
- Minimum block size: 100 KB.
- Number of blocks: About twice the number of CPU cores.

For optimal performance, compress files larger than 1 MB.

Parallel Decompression

Decompression works like compress/gzip but benefits from readahead to improve speed.

package main

import (
    "os"
    "github.com/klauspost/pgzip"
)

func main() {
    f, err := os.Open("file.gz")
    if err != nil {
        panic(err)
    }
    defer f.Close()

    r, err := pgzip.NewReader(f)
    if err != nil {
        panic(err)
    }
    defer r.Close()

    content, err := os.ReadFile("file.gz")
    if err != nil {
        panic(err)
    }

    os.WriteFile("file.txt", content, 0644)
}Code language: JavaScript (javascript)

For finer control over concurrency, use:

r, err := pgzip.NewReaderN(f, blockSize, blocks)

where:

blockSize defines the size of decompression blocks.
blocks specifies the maximum number of blocks decoded ahead of time.

Performance Comparison

Compression Performance (16-core CPU, GOMAXPROC=32)

Compressor	Speed (MB/s)	Speedup	Final Size	Size Overhead
gzip (Go standard)	16.91 MB/s	1.0x	4.78 GB	0%
gzip (klauspost)	127.10 MB/s	7.52x	4.88 GB	+2.17%
pgzip (klauspost)	2085.35 MB/s	123.34x	4.88 GB	+2.19%

Observations:

pgzip is 123 times faster than Go’s standard gzip implementation in this test.
The file size increase is minimal (+2.19%) compared to gzip.
pgzip includes a Huffman-only compression mode, reaching speeds of 450 MB/s per core.

Decompression Performance (4-core CPU)

Decompressor	Time	Speedup
gzip (Go standard)	1m 28.85s	1.0x
pgzip (klauspost)	43.48s	2.04x

Conclusion:

pgzip is more than twice as fast as standard gzip.
Readahead allows data to be processed faster without I/O blocking.
The implementation acts as a buffer, preventing unnecessary delays.

Conclusion

pgzip is an excellent choice for Go applications that require efficient parallel compression and decompression, particularly in high-performance environments.

Key advantages:

Compatible with standard gzip: No changes needed in tools or file formats.
Optimized for large data volumes: Best for files larger than 1 MB.
Multicore CPU support: Uses all available processing power.
Huffman compression for extreme speed: Up to 450 MB/s per core.
Outperforms gzip in Go: 123x faster in compression, 2x faster in decompression.