GitHub - sebyx07/zsv-ruby: SIMD-accelerated CSV parser for Ruby - 5-6x faster than CSV stdlib (original) (raw)
ZSV - SIMD-Accelerated CSV Parser for Ruby β‘
A drop-in replacement for Ruby's CSV stdlib that uses the zsv C library for 5-6x performance improvements via SIMD optimizations.
π€ Built with Claude Code
π Documentation
- Quick Start Guide - Get started in 5 minutes
- API Reference - Complete API documentation
- Verification Report - Test results and metrics
β¨ Features
- Blazing Fast: 5-6x faster than Ruby's CSV stdlib thanks to SIMD optimizations
- Memory Efficient: Streaming parser that doesn't load entire files into memory
- API Compatible: Familiar interface matching Ruby's CSV class
- Native Extension: Direct C integration for minimal overhead
- Ruby 3.3+: Modern Ruby support with proper encoding handling
π¦ Installation
Add to your Gemfile:
Or install directly:
The gem will automatically download and compile zsv 1.3.0 during installation.
π Usage
Basic Parsing
require 'zsv'
Parse entire file
rows = ZSV.read("data.csv")
=> [["a", "b", "c"], ["1", "2", "3"]]
Stream rows (memory efficient)
ZSV.foreach("large_file.csv") do |row| puts row.inspect end
Parse string
rows = ZSV.parse("a,b,c\n1,2,3\n")
Headers Mode
Use first row as headers
ZSV.foreach("data.csv", headers: true) do |row| puts row["name"] # Hash access end
Provide custom headers
ZSV.foreach("data.csv", headers: ["id", "name", "email"]) do |row| puts row["name"] end
Parser Instance
Create parser
parser = ZSV.open("data.csv", headers: true)
Read rows one at a time
row = parser.shift row = parser.shift
Iterate all rows
parser.each do |row| puts row end
Rewind to beginning
parser.rewind
Clean up
parser.close
Or use block form (auto-closes)
ZSV.open("data.csv") do |parser| parser.each { |row| puts row } end
Enumerable Methods
The parser includes Enumerable, so you can use map, select, find, etc.:
Transform rows
names = ZSV.open("users.csv", headers: true) do |parser| parser.map { |row| row["name"].upcase } end
Filter rows
adults = ZSV.open("users.csv", headers: true) do |parser| parser.select { |row| row["age"].to_i >= 18 } end
Find first match
admin = ZSV.open("users.csv", headers: true) do |parser| parser.find { |row| row["role"] == "admin" } end
Options
All parsing methods accept these options:
| Option | Type | Default | Description |
|---|---|---|---|
| headers | Boolean/Array | false | Use first row as headers or provide custom headers |
| col_sep | String | "," | Column delimiter (single character) |
| quote_char | String | "\"" | Quote character (single character) |
| skip_lines | Integer | 0 | Number of lines to skip at start |
| encoding | Encoding | UTF-8 | Source encoding |
| liberal_parsing | Boolean | false | Handle malformed CSV gracefully |
| buffer_size | Integer | 262144 | Buffer size in bytes (256KB default) |
Tab-separated values
ZSV.foreach("data.tsv", col_sep: "\t") { |row| puts row }
Pipe-separated values
ZSV.parse("a|b|c\n1|2|3", col_sep: "|")
Skip header comment lines
ZSV.foreach("data.csv", skip_lines: 2) { |row| puts row }
β‘ Performance
Benchmarks comparing ZSV vs Ruby CSV stdlib (Ruby 3.4.7):
=== Small file (1K rows, 5 cols) ===
CSV (stdlib): 163.4 i/s
ZSV: 1,013.7 i/s - 6.20x faster
=== Medium file (10K rows, 10 cols) ===
CSV (stdlib): 10.3 i/s
ZSV: 54.5 i/s - 5.27x faster
=== Large file (100K rows, 10 cols) ===
CSV (stdlib): 1.1 i/s
ZSV: 5.3 i/s - 5.00x faster
=== With headers (10K rows) ===
CSV (stdlib): 7.8 i/s
ZSV: 33.8 i/s - 4.33x faster
Memory Usage
ZSV uses significantly less memory than Ruby's CSV stdlib:
=== Memory Usage (100K rows) ===
CSV stdlib: 56.8 MB
ZSV: 9.9 MB - 82.6% less memory
=== String Allocations (10K rows) ===
CSV stdlib: 116,144 strings
ZSV: 50,005 strings - 56.9% fewer allocations
ZSV achieves ~6x lower memory usage through frozen strings and efficient C-level memory management.
Run benchmarks yourself:
bundle exec rake bench bundle exec ruby benchmark/memory_bench.rb
API Reference
Module Methods
ZSV.foreach(path, **options) { |row| }
Stream rows from a CSV file. Returns an Enumerator if no block given.
ZSV.parse(string, **options) -> Array
Parse CSV string and return all rows as an array.
ZSV.read(path, **options) -> Array
Read entire CSV file into an array.
ZSV.open(path, mode="r", **options) -> Parser
Open a CSV file and return a Parser instance. If a block is given, the parser is automatically closed after the block completes.
ZSV.new(io, **options) -> Parser
Create a Parser from any IO-like object.
Parser Instance Methods
#shift -> Array|Hash|nil
Read and return the next row. Returns nil at EOF.
#each { |row| } -> self
Iterate over all rows. Returns Enumerator without block.
#rewind -> nil
Reset parser to the beginning (file-based parsers only).
#close -> nil
Close parser and release resources.
#headers -> Array|nil
Return headers if header mode is enabled.
#closed? -> Boolean
Check if parser is closed.
#read -> Array
Read all remaining rows into an array.
Exception Classes
ZSV::Error- Base exception classZSV::MalformedCSVError- Raised on CSV parsing errorsZSV::InvalidEncodingError- Raised on encoding issues
Architecture
The gem follows SOLID principles with clear separation of concerns:
ext/zsv/
βββ zsv_ext.c # Main extension entry point, Ruby API
βββ parser.c/h # Parser state management and zsv wrapper
βββ row.c/h # Row building and conversion (arrays/hashes)
βββ options.c/h # Option parsing and validation
βββ common.h # Shared types and macros
Design Principles
- Single Responsibility: Each C module handles one concern
- Streaming First: Never load entire files into memory
- Zero-Copy Where Possible: Minimize data copying
- Proper Resource Management: RAII-style cleanup with Ruby GC
π οΈ Development
Clone and setup
git clone https://github.com/sebyx07/zsv-ruby.git cd zsv-ruby bundle install
Compile extension
bundle exec rake compile
Run tests
bundle exec rake spec
Run benchmarks
bundle exec rake bench
Clean build artifacts
bundle exec rake clean
Running Tests
The test suite includes:
- Basic parsing tests
- Header mode tests
- Custom delimiter tests
- Error handling tests
- Memory leak detection
- API compatibility tests
Compatibility
- Ruby: 3.3+ required
- Platforms: Linux, macOS (ARM and x86)
- ZSV: Compiles against zsv 1.3.0
π€ Contributing
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Write tests for your changes
- Ensure tests pass (
bundle exec rake spec) - Commit your changes (
git commit -am 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
MIT License - see LICENSE file for details.
π Credits
- Built on zsv by liquidaty
- Inspired by Ruby's CSV stdlib
- SIMD optimizations courtesy of zsv's excellent engineering
- Developed with Claude Code
πΊοΈ Roadmap
Phase 1: Core Parser (Current)
- Basic parsing (foreach, parse, read)
- Header mode
- Custom delimiters
- File and string input
Phase 2: CSV Stdlib Compatibility
- Type converters (
:numeric,:date,:date_time) - Header converters (
:downcase,:symbol) unconverted_fieldsoption
π¬ Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Upstream zsv: zsv repository