pgzip is a Go library that provides parallel gzip compression and decompression. It serves as a drop-in replacement for compress/gzip, with significant performance improvements through concurrency. This makes it particularly useful for handling large amounts of data efficiently.


Why Use pgzip?

  • Parallel processing: Splits compression into blocks processed in parallel.
  • Standard gzip compatibility: Compressed files are fully compatible with any gzip reader.
  • Efficient decompression: Supports readahead to minimize processing delays.
  • Optimized for multicore CPUs: Maximizes compression and decompression speed.
  • Automatic CRC handling: Integrity checks run in a separate goroutine.

Installation

To install the library, run:

go get github.com/klauspost/pgzip/...

Updating dependencies is also recommended:

go get -u github.com/klauspost/compress

Using pgzip in Go

Parallel Compression

pgzip works similarly to compress/gzip, with additional options for block size and parallel processing.

package main

import (
    "bytes"
    "compress/gzip"
    "os"
    "github.com/klauspost/pgzip"
)

func main() {
    var b bytes.Buffer
    w := pgzip.NewWriter(&b)

    // Set concurrency: 100 KB per block, up to 10 blocks processed in parallel
    w.SetConcurrency(100000, 10)

    w.Write([]byte("Hello, world\n"))
    w.Close()

    // Save compressed file
    os.WriteFile("file.gz", b.Bytes(), 0644)
}

Key considerations:

  • SetConcurrency(blockSize, blocks): Defines block size and the number of concurrent blocks.
  • Recommended values:
    • Minimum block size: 100 KB.
    • Number of blocks: About twice the number of CPU cores.

For optimal performance, compress files larger than 1 MB.


Parallel Decompression

Decompression works like compress/gzip but benefits from readahead to improve speed.

package main

import (
    "os"
    "github.com/klauspost/pgzip"
)

func main() {
    f, err := os.Open("file.gz")
    if err != nil {
        panic(err)
    }
    defer f.Close()

    r, err := pgzip.NewReader(f)
    if err != nil {
        panic(err)
    }
    defer r.Close()

    content, err := os.ReadFile("file.gz")
    if err != nil {
        panic(err)
    }

    os.WriteFile("file.txt", content, 0644)
}

For finer control over concurrency, use:

r, err := pgzip.NewReaderN(f, blockSize, blocks)

where:

  • blockSize defines the size of decompression blocks.
  • blocks specifies the maximum number of blocks decoded ahead of time.

Performance Comparison

Compression Performance (16-core CPU, GOMAXPROC=32)

CompressorSpeed (MB/s)SpeedupFinal SizeSize Overhead
gzip (Go standard)16.91 MB/s1.0x4.78 GB0%
gzip (klauspost)127.10 MB/s7.52x4.88 GB+2.17%
pgzip (klauspost)2085.35 MB/s123.34x4.88 GB+2.19%

Observations:

  • pgzip is 123 times faster than Go’s standard gzip implementation in this test.
  • The file size increase is minimal (+2.19%) compared to gzip.
  • pgzip includes a Huffman-only compression mode, reaching speeds of 450 MB/s per core.

Decompression Performance (4-core CPU)

DecompressorTimeSpeedup
gzip (Go standard)1m 28.85s1.0x
pgzip (klauspost)43.48s2.04x

Conclusion:

  • pgzip is more than twice as fast as standard gzip.
  • Readahead allows data to be processed faster without I/O blocking.
  • The implementation acts as a buffer, preventing unnecessary delays.

Conclusion

pgzip is an excellent choice for Go applications that require efficient parallel compression and decompression, particularly in high-performance environments.

Key advantages:

  • Compatible with standard gzip: No changes needed in tools or file formats.
  • Optimized for large data volumes: Best for files larger than 1 MB.
  • Multicore CPU support: Uses all available processing power.
  • Huffman compression for extreme speed: Up to 450 MB/s per core.
  • Outperforms gzip in Go: 123x faster in compression, 2x faster in decompression.

For any project requiring high-speed gzip operations in Go, pgzip is a top-tier solution.

Scroll to Top