GitHub - wdm0006/git-pandas: A wrapper around gitpython to produce pandas dataframes for analysis (original) (raw)

Git-Pandas

license PyPI version downloads

Git-Pandas is a powerful Python library that transforms Git repository data into pandas DataFrames, making it easy to analyze and visualize your codebase's history, contributors, and development patterns. Built on top of GitPython, it provides a simple yet powerful interface for extracting meaningful insights from your Git repositories.

Cumulative Blame

Why Git-Pandas?

Core Components

Repository

The Repository class provides a wrapper around a single Git repository, offering methods to:

ProjectDirectory

The ProjectDirectory class enables analysis across multiple repositories:

Key Features

Repository Analysis

Project Insights

GitHub Integration

Visualization Tools

Installation

Git-Pandas supports Python 2.7+ and 3.3+. Install using pip:

Quick Start

from gitpandas import Repository

Analyze a single repository

repo = Repository('/path/to/repo')

Get commit history with filtering

commits_df = repo.commit_history( branch='main', ignore_globs=['.pyc'], include_globs=['.py'] )

Analyze blame information

blame_df = repo.blame(by='repository')

Calculate bus factor

bus_factor_df = repo.bus_factor()

Analyze multiple repositories

from gitpandas import ProjectDirectory project = ProjectDirectory('/path/to/project')

Available Methods

Repository Class

Core Analysis

repo.commit_history(branch=None, limit=None, days=None, ignore_globs=None, include_globs=None) repo.file_change_history(branch=None, limit=None, days=None, ignore_globs=None, include_globs=None) repo.blame(rev="HEAD", committer=True, by="repository", ignore_globs=None, include_globs=None) repo.bus_factor(by="repository", ignore_globs=None, include_globs=None) repo.punchcard(branch=None, limit=None, days=None, by=None, normalize=None, ignore_globs=None, include_globs=None)

Repository Information

repo.list_files(rev="HEAD") repo.has_branch(branch) repo.is_bare() repo.has_coverage() repo.coverage() repo.get_commit_content(rev, ignore_globs=None, include_globs=None)

ProjectDirectory Class

Initialize with multiple repositories

project = ProjectDirectory( working_dir='/path/to/project', ignore_repos=None, verbose=True, cache_backend=None, default_branch='main' )

Common Parameters

Most analysis methods support these filtering parameters:

Contributing

We welcome contributions! Please review our Contributing Guidelines for details on:

License

This project is BSD licensed (see LICENSE.md)