RLE vs. Other Compression Methods: Pros, Cons, and Use Cases

Implementing RLE in Python: A Step-by-Step Guide with Examples

Run-Length Encoding (RLE) is a simple lossless compression technique that replaces consecutive repeated values with a single value and a count. It’s especially effective on data with long runs of repeated elements (e.g., simple images, binary masks, or repetitive text). This guide shows a clear, step-by-step implementation in Python with examples for strings, lists, and binary image masks.

When to use RLE

Effective: data with long repeated runs (e.g., black/white images, simple sensor streams).
Not effective: high-entropy data (compressed files, encrypted streams, photos with lots of detail).

1. RLE for strings (basic)

Python implementation that compresses a string of characters into (char,count) pairs encoded as a compact string.

Code (compress and decompress):

python
def rle_compress_string(s: str) -> str:
if not s:
        return ””
    result = []
    prev = s[0]
    count = 1
    for ch in s[1:]:
        if ch == prev:
            count += 1
        else:
            result.append(f”{prev}{count}“)
            prev = ch             count = 1
    result.append(f”{prev}{count}“)
    return ””.join(result)

def rle_decompressstring(rle: str) -> str:
    if not rle:
        return ””
    import re     pairs = re.findall(r’(.)(\d+)’, rle)
    return ””.join(ch * int(cnt) for ch, cnt in pairs)

Example:

python
s = “AAAABBBCCDAA” c = rle_compress_string(s) # “A4B3C2D1A2” orig = rle_decompressstring(c) # “AAAABBBCCDAA”

Notes:

This encoding assumes the character set excludes digits as separate tokens or uses a clear parsing scheme. For general use, separate value and count with a delimiter or use structured storage.

2. RLE for lists (generic values)

Handles arbitrary hashable values (ints, tuples). Output is list of (value, count) tuples.

Code:

python
from typing import List, Tuple, TypeVar T = TypeVar(’T’) def rle_compress_list(data: List[T]) -> List[Tuple[T, int]]: if not data: return [] out = [] prev = data[0] count = 1 for x in data[1:]: if x == prev: count += 1 else: out.append((prev, count)) prev = x count = 1 out.append((prev, count)) return out def rle_decompresslist(rle: List[Tuple[T, int]]) -> List[T]: out = [] for val, cnt in rle: out.extend([val] * cnt) return out

Example:

python
data = [1,1,1,2,2,3,3,3,3] c = rle_compress_list(data) # [(1,3),(2,2),(3,4)] orig = rle_decompresslist(c) # [1,1,1,2,2,3,3,3,3]

3. RLE for binary image masks (2D arrays)

Common for compressing binary masks (0/1). Flatten the array (row-major) and compress runs. Store width/height or use scanline boundaries if needed.

Code using numpy:

python
import numpy as np from typing import Tuple, List def rle_compress_mask(mask: np.ndarray) -> List[int]: # mask: 2D binary array (0/1) flat = mask.ravel(order=‘C’) # row-major if flat.size == 0: return [] runs = [] prev = int(flat[0]) count = 1 for b in flat[1:]: b = int(b) if b == prev: count += 1 else: runs.append(prev) runs.append(count) prev = b count = 1 runs.append(prev) runs.append(count) return runs def rle_decompressmask(runs: List[int], shape: Tuple[int, int]) -> np.ndarray: flat = [] it = iter(runs) for val, cnt in zip(it, it): flat.extend([val] * cnt) arr = np.array(flat, dtype=np.uint8) return arr.reshape(shape, order=‘C’)

Example:

python
mask = np.array([[1,1,0],[0,0,0],[1,1,1]]) runs = rle_compress_mask(mask) # 1,2,0,1,0,3,1,3 recon = rle_decompressmask(runs, mask.shape)

Optimization note:

Often store only lengths of 1-runs or 0-runs depending on convention (e.g., COCO run-length uses start positions and lengths). Choose representation that suits your storage/interop needs.

4. Handling edge cases

Large counts: use integers (no overflow in Python). If serializing to text, ensure counts are unambiguous (use delimiters or fixed-width).

Mixed data types: serialize values before compression (e.g., JSON, bytes).

Streaming data: emit runs as they complete; keep current run state between chunks.

5. Performance tips

For large numeric arrays, use numpy vectorized methods to find run boundaries (diff != 0) for faster compression.

For bytes, using Python’s bytes/bytearray and memoryviews is faster than lists of ints.

When storing compressed data, consider binary formats (struct, protobuf) rather than string concatenation.

6. Summary example: compact byte-oriented RLE

Simple bytes compressor/decompressor:

python
def rle_compress_bytes(data: bytes) -> bytes: if not data: return b” out = bytearray() prev = data[0] count = 1 for b in data[1:]: if b == prev and count < 255: count += 1 else: out.extend([prev, count]) prev = b count = 1 out.extend([prev, count]) return bytes(out) def rle_decompress_bytes(r: bytes) -> bytes: it = iter(r) out = bytearray() for val, cnt in zip(it, it): out.extend([val] * cnt) return bytes(out)

Use-case: small embedded systems or simple image formats; note the 255-count cap — extend with multi-byte counts if needed.

Further reading: look up run-length encoding variations (packbits, TIFF RLE, COCO mask RLE) when targeting specific file formats.

RLE vs. Other Compression Methods: Pros, Cons, and Use Cases

Implementing RLE in Python: A Step-by-Step Guide with Examples

When to use RLE

1. RLE for strings (basic)

2. RLE for lists (generic values)

3. RLE for binary image masks (2D arrays)

4. Handling edge cases

5. Performance tips

6. Summary example: compact byte-oriented RLE

Comments

Leave a Reply Cancel reply

More posts

PC Confidential — The Ultimate Guide to Secure Home Networks

CallZap Setup Guide: From Signup to First Automated Call

ChrisPC Free VideoTube Downloader Review: Features, Pros & Cons

Emergency Removal: W32.Blaster Worm Tool to Restore Your PC