Integrating MegaLinter to Automate Linting Across Multiple Codebases. A Technical Description. (original) (raw)

Thorsten Foltz

Working as a team on a common code basis is often challenging

If you’re not familiar with linters, or specifically with MegaLinter, please take a look at my previous article on the topic. In contrast to the previous article, this one focuses on implementing several linters using MegaLinter for Python, Docker, SQL, YAML, Bash, JSON, Markdown, Make, Terraform, the repository itself, and spellchecking. Additionally, an approach is demonstrated to implement SQLFluff not only for SQL but also for dbt. A shell script is also introduced, which tests for specific patterns in Git branch names. Finally, all of these are integrated into a pipeline on Azure DevOps for use within a CI/CD process. The following linters will be used:

Set a MegaLinter configuration file

In the root of your repository, create a file named .mega-linter.yml. This file contains the configuration for MegaLinter itself and for many, but not all, of the linters. My file:

Configuration file for MegaLinter

See all available variables at https://megalinter.io/configuration/

and in linters documentation

APPLY_FIXES: none # all, none, or list of linter keys
BASH_SHELLCHECK_ARGUMENTS: -e "SC2162"
BASH_EXEC_FILTER_REGEX_EXCLUDE: "dbt_packages"
BASH_SHELLCHECK_FILTER_REGEX_EXCLUDE: "dbt_packages"
CLEAR_REPORT_FOLDER: true
DISABLE_ERRORS: false
ENABLE_LINTERS:

Allow me to explain my settings shortly:

In the following, any linter that simply searches for its configuration file in the .linter directory will not be described. The same applies to cases where only a specific directory is ignored from linting.

Set Linter Configuration

Some linters can be configured within the .mega-linter.yml file, while others require separate configuration files, which can often be placed in a custom directory—for example, .linters in my case. However, some linters strictly require their configuration files to be at the root of the repository. Let's start with those. Note that these are not configuration files, but ignore files: .semigrepignore, .sqlfluffignore, and .trivyignore.

My configuration files within directory .linters are:

.bandit.yml which exclude certain directories and skips rule B101.

#FILE: bandit.yml
exclude_dirs: ["venv", "megalinter-reports", "dbt_packages"]
#tests: ['B201', 'B301']
skips: ["B101"]

.checkov.yml which just ignores directory dbt_packages

skip-path:

.flake8 the famous python linter, ignoring rules E501 and F821 as well as ignoring two files.

[flake8]
extend-ignore = E501, F821
exclude =
test_copy_files.py,
pycache

kics.config is set to check Docker and Terraform files, with the LTS Spark version check ignored.

verbose: true
type:

.ls-lint.yml set for certain file formats rules. In my case file names must always be in lower cases or snake cases. Some directories are excluded.

ls:
.sql: lowercase | snake_case
.py: lowercase | snake_case
.tf: lowercase | snake_case
.sh: lowercase | snake_case
.yml: lowercase | snake_case
.json: lowercase | snake_case
.txt: lowercase | snake_case
.hcl: lowercase | snake_case
.toml: lowercase | snake_case
.config: lowercase | snake_case
.env: lowercase | snake_case

ignore:

lychee.toml accept response code 200 and 429. Moreover, ignore certain urls, files and pathes from check.

Accepts log level: "error", "warn", "info", "debug", "trace"

verbose = "info"

Don't show interactive progress bar while checking links.

no_progress = true

accept = ["200", "429"]

exclude = [
"https://megalinter.io/configuration/",
"file:///tmp/lint/dwh/models/logo.png"
]

exclude_path = [
"logs",
"megalinter-reports",
".venv",
"dbt_packages"
]

.markdown-link-check.json just ignore certain patterns and status codes.

{
"ignorePatterns": [
{
"pattern": "logo.png"
}
],
"retryOn429": true,
"retryCount": 5,
"aliveStatusCodes": [0, 200, 203, 404]
}

.markdownlint.json enables or disables certain rules.

{
"MD004": false,
"MD007": {
"indent": 2
},
"MD013": {
"line_length": 500,
"code_blocks": false
},
"MD026": {
"punctuation": ".,;:!。,;:"
},
"MD029": false,
"MD033": false,
"MD036": false,
"blank_lines": false,
"MD041": false
}

.proselintrc disable or enable rules.

{
"checks": {
"airlinese.misc": false
, "annotations.misc": false
, "archaism.misc": false
, "cliches.hell": true
, "cliches.misc": true
, "consistency.spacing": true
, "consistency.spelling": true
, "corporate_speak.misc": true
, "cursing.filth": false
, "cursing.nfl": false
, "dates_times.am_pm": true
, "dates_times.dates": true
, "hedging.misc": true
, "hyperbole.misc": true
, "jargon.misc": false
, "lgbtq.offensive_terms": true
, "lgbtq.terms": true
, "lexical_illusions.misc": false
, "links.broken": true
, "malapropisms.misc": true
, "misc.apologizing": true
, "misc.back_formations": true
, "misc.bureaucratese": true
, "misc.but": true
, "misc.capitalization": true
, "misc.chatspeak": true
, "misc.commercialese": true
, "misc.currency": true
, "misc.debased": true
, "misc.false_plurals": true
, "misc.illogic": true
, "misc.inferior_superior": true
, "misc.latin": true
, "misc.many_a": true
, "misc.metaconcepts": true
, "misc.narcissism": true
, "misc.phrasal_adjectives": false
, "misc.preferred_forms": true
, "misc.pretension": true
, "misc.professions": true
, "misc.punctuation": true
, "misc.scare_quotes": true
, "misc.suddenly": true
, "misc.tense_present": true
, "misc.waxed": true
, "misc.whence": true
, "mixed_metaphors.misc": true
, "mondegreens.misc": true
, "needless_variants.misc": true
, "nonwords.misc": true
, "oxymorons.misc": true
, "psychology.misc": true
, "redundancy.misc": true
, "redundancy.ras_syndrome": true
, "skunked_terms.misc": true
, "spelling.able_atable": true
, "spelling.able_ible": true
, "spelling.athletes": false
, "spelling.em_im_en_in": true
, "spelling.er_or": true
, "spelling.in_un": true
, "spelling.misc": true
, "security.credit_card": true
, "security.password": true
, "sexism.misc": true
, "terms.animal_adjectives": true
, "terms.eponymous_adjectives": true
, "terms.venery": true
, "typography.diacritical_marks": true
, "typography.exclamation": true
, "typography.symbols": false
, "uncomparables.misc": true
, "weasel_words.misc": true
, "weasel_words.very": true
}
}

.sqlfluff the famous sql linter. Pay attention, this is only for sql files outside of dbt. Anything regarding dbt is ignored, because dbt needs the same files, but slightly different. I will show it later. Configure sql code as you like. For example, use always leading comma or every table, view and column must be in lower case. Important: define your database on the top. In my case it’s databricks.

[sqlfluff]
dialect = databricks
max_line_length = 120

[sqlfluff:indentation]
tab_space_size = 4
indent_unit = space
indented_joins = false
indented_using_on = false
allow_implicit_indents = True

[sqlfluff:rules:aliasing.table]
aliasing.table = explicit

[sqlfluff:rules:aliasing.column]
aliasing.column = explicit

[sqlfluff:rules:aliasing.expression]
allow_scalar = True

[sqlfluff:rules:ambiguous.column_references]
group_by_and_order_by_style = explicit

[sqlfluff:rules:capitalisation.keywords]
capitalisation_policy = lower

[sqlfluff:rules:capitalisation.identifiers] # Tables, columns, views
extended_capitalisation_policy = lower

[sqlfluff:rules:capitalisation.functions] # Function names
capitalisation_policy = lower

[sqlfluff:rules:capitalisation.literals] # Null & Boolean Literals
capitalisation_policy = lower

[sqlfluff:rules:capitalisation.types] # datatypes
extended_capitalisation_policy = lower

[sqlfluff:rules:layout.spacing] # removal of trailing whitespace
no_trailing_whitespace = true
extra_whitespace = true

[sqlfluff:rules:layout.commas] # Leading comma enforcement
line_position = leading

[sqlfluff:layout:type:comma] # Added with conjunction with leading commas
line_position = leading

.tflint.hcl in this case only set a required version.

tflint {
required_version = ">= 0.54"
}

.yamllint.yml Also just a few settings. Especially the allowed length of a row.

extends: default
rules:
new-lines:
level: warning
type: unix
line-length:
max: 500
document-start:
present: false
comments:
min-spaces-from-content: 1 # Used to follow prettier standard: https://github.com/prettier/prettier/pull/10926

Linters outside of MegaLinter

MegaLinter cannot incorporate all existing linters. However, they offer a solution for extending it with a custom linter of your choice (though I haven’t tested it — maybe in the future).

I created my own Dockerfile and a simple shell script to use instead. The first leverages sqlfluff for dbt, while the second ensures Git branch names follow specific rules.

Let’s start with sqlfluff for dbt. The configuration file, .sqlfluff, remains the same as before, with a slight adjustment at the beginning:

[sqlfluff]
dialect = databricks
templater = dbt
max_line_length = 150

This file must now be stored in the root of your dbt project, rather than the root of the repository. Additionally, if you need to exclude certain directories or files from linting, place a .sqlfluffignore file in the same location.

The needed docker file:

FROM python:3.13-slim

WORKDIR /app

COPY pyproject.toml uv.lock ./docker/entrypoint.sh ./

RUN apt-get update \
&& apt-get install build-essential=12.9 --no-install-recommends -y \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* \
&& useradd -ms /bin/bash dbt-generic-user \
&& pip install uv==0.6.9 --no-cache-dir \
&& uv sync \
&& chown -R dbt-generic-user:dbt-generic-user /app \
&& chmod +x /app/.venv/bin/activate \
&& chmod +x /app/entrypoint.sh

USER dbt-generic-user

HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 CMD ["uv", "--version"] || exit 1

ENTRYPOINT ["/app/entrypoint.sh"]
CMD ["lint"]

Additionally, the project file pyproject.toml is required, along with entrypoint.sh. The latter is stored in the docker subdirectory alongside the Dockerfile, while the project file is placed in the root of the repository.

The project file:

[project]
name = "linters"
version = "0.4.0"
description = "Some text"
readme = "README.md"
requires-python = ">=3.11"
dependencies = [
"dbt-core ~= 1.9",
"dbt-databricks ~= 1.9",
"pytest ~= 8.3",
"python-dotenv ~= 1.0",
"sqlfluff-templater-dbt ~= 3.3.0",
]

The shell script entrypoint:

#!/bin/bash

shellcheck disable=SC1091

source .venv/bin/activate
cd /app/dwh || exit 1

echo "Running SQLFluff..."
if ! sqlfluff "$@"; then
echo "sqlfluff failed. Exiting."
exit 2
fi

echo "Linting completed successfully."

Please note that sqlfluff will establish a connection to your database, requiring a functional dbt profile. If you run the linter locally, you can use your default profile. However, for a CI/CD process, you must specify a profile—such as profiles-pipeline.yml—configured with environment variables.

dwh:
outputs:
prd:
catalog: bronze
host:
http_path:
schema: default
threads: 4
auth_type: oauth
type: databricks
client_id: "{{ env_var('DBT_ENV_SECRET_DATABRICKS_CLIENT_ID') }}"
client_secret: "{{ env_var('DBT_ENV_SECRET_DATABRICKS_CLIENT_SECRET') }}"
target: prd

Locally the docker can be executed:

docker run --rm -v ./dwh:/app/dwh -v ~/.dbt/profiles.yml:/home/dbt-generic-user/.dbt/profiles.yml sqlfluff-dbt-linter

To run automated fixes, simply extend the command with fix.

In a pipeline, make sure to update your home path to match the location of the profile, which must be stored in your repository.

Another linter checks Git branch names. Personally, I prefer branches to follow a specific pattern — starting with a prefix like feature, fix, or test, followed by a slash and then a descriptive purpose, ideally including a ticket number.

For example: feature/db-123-financial-data.

My file check_git_branch_name.sh stored within the directory .linters :

#!/bin/bash

Define color codes

RED='\033[0;31m'
GREEN='\033[0;32m'
NC='\033[0m' # No Color

Unicode symbols

CHECK_MARK="${GREEN}✔${NC}"
CROSS_MARK="${RED}✖${NC}"

Get the current branch name

BRANCH_NAME=${1:-$(git rev-parse --abbrev-ref HEAD)}

Define the branch name pattern

BRANCH_NAME_REGEX="^(feature|fix|hotfix|chore|refactor|test|docs)/[a-z0-9._-]+$"

Check the branch name against the pattern

if [[ ! BRANCHNAME= BRANCH_NAME =~ BRANCHNAME= BRANCH_NAME_REGEX ]]; then
echo -e "\n\n${CROSS_MARK} REDError:{RED}Error:REDError:{NC} Branch name '${BRANCH_NAME}' does not follow the naming convention.\n"
echo -e "${RED}Branch names must match the pattern:${NC} $BRANCH_NAME_REGEX"
echo -e "${RED}Do you use feature/fix/hotfix/chore/refactor/test/docs as a prefix?${NC}"
echo -e "${RED}Do you use only lowercase letters, numbers, dots, and hyphens in the branch name?${NC}"
echo -e "${RED}You can rename your branch by running:${NC} git branch -m \n\n"
exit 1
fi

echo -e "\n\n${CHECK_MARK} GREENBranchname′{GREEN}Branch name 'GREENBranchname{BRANCH_NAME}' is valid.${NC}\n\n"

Execution

I’ve already explained how to execute sqlfluff with dbt using Docker, and running a shell script should be straightforward.

MegaLinter requires npx and can be executed with:

npx mega-linter-runner

I recommend consolidating all these commands into a single Makefile or shell script. You can also extend it by adding the fix option to allow linters to automatically resolve any issues they can.

For example:

lint: ## Lints the code using sqlfluff
@docker run --rm -v ./dwh:/app/dwh -v ~/.dbt/profiles.yml:/home/dbt-generic-user/.dbt/profiles.yml sqlfluff-dbt-linter
@sh ./.linters/check_git_branch_name.sh
@npx mega-linter-runner

In an Azure DevOps pipeline, it could look like this:

trigger: none

pr:
branches:
include:
- master

pool:
vmImage: ubuntu-latest

steps:

Checkout triggering repo

Build Docker image from the Dockerfile in the repository

Run the Docker container built from the Dockerfile

Pull MegaLinter docker image

Run MegaLinter

Final remarks:

By keeping these considerations in mind, you’ll have a more efficient and consistent linting process, both locally and in your CI/CD pipeline, while also fostering a culture of quality coding within your team.