GitHub - fastlmm/bed-reader: A library for easy, fast, and efficient reading & writing of PLINK Bed files (original) (raw)

PyPI version Build Status PyPI

Read and write the PLINK BED format, simply and efficiently.

This is the Python README. For Rust, see README-rust.md.

Highlights

Install

Full version: With all optional dependencies:

pip install bed-reader[samples,sparse]

Minimal version: Depends only on numpy:

Usage

Read genomic data from a .bed file.

import numpy as np from bed_reader import open_bed, sample_file

file_name = sample_file("small.bed") bed = open_bed(file_name) val = bed.read() print(val) [[ 1. 0. nan 0.] [ 2. 0. nan 2.] [ 0. 1. 2. 0.]] del bed

Read every second individual and SNPs (variants) from 20 to 30.

file_name2 = sample_file("some_missing.bed") bed2 = open_bed(file_name2) val2 = bed2.read(index=np.s_[::2,20:30]) print(val2.shape) (50, 10) del bed2

List the first 5 individual (sample) ids, the first 5 SNP (variant) ids, and every unique chromosome. Then, read every genomic value in chromosome 5.

with open_bed(file_name2) as bed3: ... print(bed3.iid[:5]) ... print(bed3.sid[:5]) ... print(np.unique(bed3.chromosome)) ... val3 = bed3.read(index=np.s_[:,bed3.chromosome=='5']) ... print(val3.shape) ['iid_0' 'iid_1' 'iid_2' 'iid_3' 'iid_4'] ['sid_0' 'sid_1' 'sid_2' 'sid_3' 'sid_4'] ['1' '10' '11' '12' '13' '14' '15' '16' '17' '18' '19' '2' '20' '21' '22' '3' '4' '5' '6' '7' '8' '9'] (100, 6)

From the cloud: open a file and read data for one SNP (variant) at index position 2.

with open_bed("https://raw.githubusercontent.com/fastlmm/bed-sample-files/main/small.bed") as bed: ... val = bed.read(index=np.s_[:,2], dtype="float64") ... print(val) [[nan] [nan] [ 2.]]