Integrating MegaLinter to Automate Linting Across Multiple Codebases. A Technical Description. (original) (raw)

Working as a team on a common code basis is often challenging

If you’re not familiar with linters, or specifically with MegaLinter, please take a look at my previous article on the topic. In contrast to the previous article, this one focuses on implementing several linters using MegaLinter for Python, Docker, SQL, YAML, Bash, JSON, Markdown, Make, Terraform, the repository itself, and spellchecking. Additionally, an approach is demonstrated to implement SQLFluff not only for SQL but also for dbt. A shell script is also introduced, which tests for specific patterns in Git branch names. Finally, all of these are integrated into a pipeline on Azure DevOps for use within a CI/CD process. The following linters will be used:

bash-exec: Checks if shell files are executable. In MegaLinter, this is referred to as BASH_EXEC.
shellcheck: Provides warnings and suggestions for bash/sh scripts. In MegaLinter, this is referred to as BASH_SHELLCHECK.
jsonlint: A JSON/CJSON/JSON5 parser, validator and pretty-printer. In MegaLinter, this is referred to as JSON_JSONLINT.
jsonprettier: Code formatter, enforces a consistent style. In Megalinter, this is referred to as JSON_PRETTIER.
jsonv8r: Check the validity of Json files, if they have a matching schema defined on schemastore.org. In Megalinter, this is referred to as JSON_V8R.
checkmake: Linter and analyser for Makefiles. I like it pretty much, but it’s currently disabled, due to security issues. Last update almost two years ago. In MegaLinter, this is referred to as MAKEFILE_CHECKMAKE.
markdownlint: Checks for errors in Markdown files, and can also auto-fix some of them. In MegaLinter, this is referred to as MARKDOWN_MARKDOWNLINT.
markdown-link-check: Checks all of the hyperlinks in a markdown text to determine if they are alive or dead. In MegaLinter, this is referred to as MARKDOWN_MARKDOWN_LINK_CHECK.
markdown-table-formatter: Check markdown tables formatting and apply fixes. In MegaLinter, this is referred to as MARKDOWN_MARKDOWN_TABLE_FORMATTER.
black: A python code formatter. In MegaLinter, this is referred to as PYTHON_BLACK.
bandit: A tool designed to find common security issues in Python code. In MegaLinter, this is referred to as PYTHON_BANDIT.
flake8: A python tool that glues together pycodestyle, pyflakes, mccabe, and third-party plugins to check the style and quality of some python code. In MegaLinter, this is referred to as PYTHON_FLAKE8.
isort: A python utility/library to sort imports alphabetically and automatically separate into sections and by type. In MegaLinter, this is referred to as PYTHON_ISORT.
checkov: Prevent cloud misconfigurations and find vulnerabilities during build-time in infrastructure as code, container images and open source packages. In MegaLinter, this referred to as REPOSITORY_CHECKOV.
gitleaks: A tool for detecting secrets like passwords, API keys, and tokens in git repos, files, and whatever else you wanna throw at it via stdin. In MegaLinter, this is referred to as REPOSITORY_GITLEAKS.
kics: Find security vulnerabilities, compliance issues, and infrastructure misconfigurations early in the development cycle of your infrastructure-as-code. In MegaLinter, this is referred to as REPOSITORY_KICS.
ls-lint: A fast directory and filename linter. In MegaLinter, this is referred to as REPOSITORY_LS_LINT.
secretlint: Linting tool to prevent committing credential. In MegaLinter, this is referred to as REPOSITORY_SECRETLINT.
semgrep: Lightweight static analysis for many languages. Find bug variants with patterns that look like source code. In MegaLinter, this is referred to as REPOSITORY_SEMGREP.
trivy: Find vulnerabilities, misconfigurations, secrets, SBOM in containers, Kubernetes, code repositories, clouds and more. In MegaLinter, this is referred to as REPOSITORY_TRIVY.
trufflehog: Find, verify, and analyze leaked credentials. In MegaLinter, this is referred to as REPOSITORY_TRUFFLEHOG.
lychee: Fast, async, stream-based link checker written in Rust. Finds broken URLs and mail addresses inside Markdown, HTML, reStructuredText, websites and more. In MegaLinter, this is referred to as SPELL_LYCHEE.
proselint: A linter for English prose. In MegaLinter, this is referred to as SPELL_PROSELINT.
sqlfluff: A modular SQL linter and auto-formatter with support for multiple dialects and templated code. Used also for dbt. In MegaLinter, this is referred to as SQL_SQLFLUFF.
terragrunt: A flexible orchestration tool that allows Infrastructure as Code written in OpenTofu/Terraform to scale. In MegaLinter, this is referred to as TERRAFORM_TERRAGRUNT.
tflint: A pluggable Terraform Linter. In MegaLinter, this is referred to as TERRAFORM_TFLINT.
yamlprettier: Look for the jsonprettier above. Same tool, just for yaml. In MegaLinter, this is referred to as YAML_PRETTIER.
yamlv8r: Look for jsonv8r. Same tool, just for yaml. In MegaLinter, this is referred to as YAML_V8R.

Set a MegaLinter configuration file

In the root of your repository, create a file named .mega-linter.yml. This file contains the configuration for MegaLinter itself and for many, but not all, of the linters. My file:

Configuration file for MegaLinter

See all available variables at https://megalinter.io/configuration/

and in linters documentation

APPLY_FIXES: none # all, none, or list of linter keys
BASH_SHELLCHECK_ARGUMENTS: -e "SC2162"
BASH_EXEC_FILTER_REGEX_EXCLUDE: "dbt_packages"
BASH_SHELLCHECK_FILTER_REGEX_EXCLUDE: "dbt_packages"
CLEAR_REPORT_FOLDER: true
DISABLE_ERRORS: false
ENABLE_LINTERS:

BASH_EXEC
BASH_SHELLCHECK
JSON_JSONLINT
JSON_PRETTIER
JSON_V8R
MAKEFILE_CHECKMAKE
MARKDOWN_MARKDOWNLINT
MARKDOWN_MARKDOWN_LINK_CHECK
MARKDOWN_MARKDOWN_TABLE_FORMATTER
PYTHON_BLACK
PYTHON_BANDIT
PYTHON_FLAKE8
PYTHON_ISORT
REPOSITORY_CHECKOV
REPOSITORY_GITLEAKS
REPOSITORY_KICS
REPOSITORY_LS_LINT
REPOSITORY_SECRETLINT
REPOSITORY_SEMGREP
REPOSITORY_TRIVY
REPOSITORY_TRUFFLEHOG
SPELL_LYCHEE
SPELL_PROSELINT
SQL_SQLFLUFF
TERRAFORM_TERRAGRUNT
TERRAFORM_TFLINT
YAML_PRETTIER
YAML_V8R
FAIL_IF_UPDATED_SOURCES: false
FILEIO_REPORTER: false
FILTER_REGEX_EXCLUDE: none
FLAVOR_SUGGESTIONS: false
FORMATTERS_DISABLE_ERRORS: false
IGNORE_GITIGNORED_FILES: true
LINTER_RULES_PATH: .linters # Directory for all linter configuration rules.
LOG_LEVEL: INFO
MARKDOWN_DEFAULT_STYLE: markdownlint
MARKDOWN_MARKDOWN_LINK_CHECK_FILTER_REGEX_EXCLUDE: "dbt_packages"
MARKDOWN_MARKDOWN_LINK_CHECK_RULES_PATH: .linters
MARKDOWN_MARKDOWNLINT_FILTER_REGEX_EXCLUDE: "dbt_packages"
MARKDOWN_MARKDOWN_TABLE_FORMATTER_FILTER_REGEX_EXCLUDE: "dbt_packages"
PARALLEL: true
PRINT_ALPACA: false
PYTHON_BANDIT_RULES_PATH: .linters
PYTHON_BANDIT_CONFIG_FILE: .bandit.yml
PYTHON_BANDIT_FILTER_REGEX_EXCLUDE: "dbt_packages"
PYTHON_BLACK_FILTER_REGEX_EXCLUDE: "dbt_packages"
PYTHON_FLAKE8_FILTER_REGEX_EXCLUDE: "dbt_packages"
PYTHON_FLAKE8_RULES_PATH: .linters
PYTHON_ISORT_FILTER_REGEX_EXCLUDE: "dbt_packages"
REPOSITORY_LS_LINT_RULES_PATH: .linters
REPOSITORY_SEMGREP_RULESETS:
[
"p/comment",
"p/cwe-top-25",
"p/docker-compose",
"p/dockerfile",
"p/owasp-top-ten",
"p/python",
"p/r2c-security-audit",
"p/secure-defaults",
"p/terraform",
]
SHOW_ELAPSED_TIME: true
SHOW_SKIPPED_LINTERS: false
SPELL_LYCHEE_FILTER_REGEX_EXCLUDE: "dbt_packages"
SPELL_PROSELINT_FILTER_REGEX_EXCLUDE: "dbt_packages"
SPELL_PROSELINT_RULES_PATH: .linters
SPELL_VALE_RULES_PATH: .linters
SQL_SQLFLUFF_CONFIG_FILE: .sqlfluff
SQL_SQLFLUFF_RULES_PATH: .linters
YAML_PRETTIER_FILTER_REGEX_EXCLUDE: "dbt_packages"
YAML_V8R_FILTER_REGEX_EXCLUDE: "dbt_packages"

Allow me to explain my settings shortly:

APPLY_FIXES: MegaLinter can automatically fix your code as defined in the linter rules. I don’t want that automatically, thus I decided to use “none”.
BASH_SHELLCHECK_ARGUMENTS: With “-e” a rule is excluded. In this case rule “SC2162”.
BASH_EXEC_FILTER_REGEX_EXCLUDE: Any files in the “dbt_packages” directory will be ignored by the bash_exec linter. Ignoring this directory is necessary because it contains other Git repositories with files that might violate certain rules, often too many to fix comprehensively.
BASH_SHELLCHECK_FILTER_REGEX_EXCLUDE: “dbt_packages”. Same as described above.
CLEAR_REPORT_FOLDER: Flag to clear files from report folder (usually megalinter-reports) before starting the linting process. I don’t need to keep these reports, so I set it to true.
DISABLE_ERRORS: Flag to have the linter complete with exit code 0 even if errors were detected. By default, it is set to false, but I explicitly define it. After all, what’s the point of a linter if you disable errors?
ENABLE_LINTERS: A list of all linters used in this repository. For an explanation, see above. Every linter you want to use must be specified here. Instead of enabling specific linters, you can also use DISABLE_LINTERS to enable all linters except those explicitly mentioned.
FAIL_IF_UPDATED_SOURCES: If set to true, MegaLinter fails if a linter or formatter has autofixed sources, even if there are no errors.
FILEIO_REPORTER: Upload MegaLinter reports to file.io if set to true
FILTER_REGEX_EXCLUDE: Regular expression defining which files will be excluded from linting. Probably better to set exclusions for specific linters.
FLAVOR_SUGGESTIONS: The main drawback of MegaLinter is its heavy resource usage. While caching minimizes the impact locally, in a CI/CD pipeline, execution can take significant time. If you’re charged per execution time rather than a flat fee, this can add to your costs — though likely minimal compared to other expenses. However, not all linters are necessary for every project. To address this, MegaLinter offers a recommended configuration, which you can enable by setting it to true.
FORMATTERS_DISABLE_ERRORS: Some linters may display warnings or errors if a file is improperly formatted. I recommend always setting this to false.
IGNORE_GITIGNORED_FILES: The recommended and default setting is true, meaning anything ignored by Git due to the .gitignore file is also ignored by MegaLinter.
LINTER_RULES_PATH: Some linters require configuration files to be stored at the root of the repository. However, many allow using a custom directory. Create a directory of your choice and specify it here.
LOG_LEVEL: How much output the script will generate to the console. One of INFO, DEBUG, WARNING or ERROR.
MARKDOWN_DEFAULT_STYLE: Markdown default style to check/apply. markdownlint,remark-lint.
MARKDOWN_MARKDOWN_LINK_CHECK_FILTER_REGEX_EXCLUDE: Linter Markdown_link_check excludes anything in directory “dbt_packages”.

In the following, any linter that simply searches for its configuration file in the .linter directory will not be described. The same applies to cases where only a specific directory is ignored from linting.

PARALLEL: Process linters in parallel to improve overall MegaLinter performance. If true, linters of same language or formats are grouped in the same parallel process to avoid lock issues if fixing the same files.
PRINT_ALPACA: Enable printing alpaca image to console.
REPOSITORY_SEMGREP_RULESETS: List of rules which are included. Find all available rules here.
SHOW_ELAPSED_TIME: Displays elapsed time in reports.
SHOW_SKIPPED_LINTERS: Displays all disabled linters mega-linter could have run. Please note, from time to time MegaLinter disable certain linters, due to bugs.

Set Linter Configuration

Some linters can be configured within the .mega-linter.yml file, while others require separate configuration files, which can often be placed in a custom directory—for example, .linters in my case. However, some linters strictly require their configuration files to be at the root of the repository. Let's start with those. Note that these are not configuration files, but ignore files: .semigrepignore, .sqlfluffignore, and .trivyignore.

My configuration files within directory .linters are:

.bandit.yml which exclude certain directories and skips rule B101.

#FILE: bandit.yml
exclude_dirs: ["venv", "megalinter-reports", "dbt_packages"]
#tests: ['B201', 'B301']
skips: ["B101"]

.checkov.yml which just ignores directory dbt_packages

skip-path:

/dwh/dbt_packages

.flake8 the famous python linter, ignoring rules E501 and F821 as well as ignoring two files.

[flake8]
extend-ignore = E501, F821
exclude =
test_copy_files.py,
pycache

kics.config is set to check Docker and Terraform files, with the LTS Spark version check ignored.

verbose: true
type:

Dockerfile
Terraform
log-level: INFO
exclude-queries:
5a627dfa-a4dd-4020-a4c6-5f3caf4abcd6 # Beta - Check use no LTS Spark Version --> Ignore check for LTS version in Spark

.ls-lint.yml set for certain file formats rules. In my case file names must always be in lower cases or snake cases. Some directories are excluded.

ignore:

.git
dwh/dbt_packages
dwh/target
terraform/workspaces/.terraform
.venv

lychee.toml accept response code 200 and 429. Moreover, ignore certain urls, files and pathes from check.

Accepts log level: "error", "warn", "info", "debug", "trace"

verbose = "info"

Don't show interactive progress bar while checking links.

no_progress = true

accept = ["200", "429"]

exclude = [
"https://megalinter.io/configuration/",
"file:///tmp/lint/dwh/models/logo.png"
]

exclude_path = [
"logs",
"megalinter-reports",
".venv",
"dbt_packages"
]

.markdown-link-check.json just ignore certain patterns and status codes.

{
"ignorePatterns": [
{
"pattern": "logo.png"
}
],
"retryOn429": true,
"retryCount": 5,
"aliveStatusCodes": [0, 200, 203, 404]
}

.markdownlint.json enables or disables certain rules.

{
"MD004": false,
"MD007": {
"indent": 2
},
"MD013": {
"line_length": 500,
"code_blocks": false
},
"MD026": {
"punctuation": ".,;:!。，；:"
},
"MD029": false,
"MD033": false,
"MD036": false,
"blank_lines": false,
"MD041": false
}

.proselintrc disable or enable rules.

{
"checks": {
"airlinese.misc": false
, "annotations.misc": false
, "archaism.misc": false
, "cliches.hell": true
, "cliches.misc": true
, "consistency.spacing": true
, "consistency.spelling": true
, "corporate_speak.misc": true
, "cursing.filth": false
, "cursing.nfl": false
, "dates_times.am_pm": true
, "dates_times.dates": true
, "hedging.misc": true
, "hyperbole.misc": true
, "jargon.misc": false
, "lgbtq.offensive_terms": true
, "lgbtq.terms": true
, "lexical_illusions.misc": false
, "links.broken": true
, "malapropisms.misc": true
, "misc.apologizing": true
, "misc.back_formations": true
, "misc.bureaucratese": true
, "misc.but": true
, "misc.capitalization": true
, "misc.chatspeak": true
, "misc.commercialese": true
, "misc.currency": true
, "misc.debased": true
, "misc.false_plurals": true
, "misc.illogic": true
, "misc.inferior_superior": true
, "misc.latin": true
, "misc.many_a": true
, "misc.metaconcepts": true
, "misc.narcissism": true
, "misc.phrasal_adjectives": false
, "misc.preferred_forms": true
, "misc.pretension": true
, "misc.professions": true
, "misc.punctuation": true
, "misc.scare_quotes": true
, "misc.suddenly": true
, "misc.tense_present": true
, "misc.waxed": true
, "misc.whence": true
, "mixed_metaphors.misc": true
, "mondegreens.misc": true
, "needless_variants.misc": true
, "nonwords.misc": true
, "oxymorons.misc": true
, "psychology.misc": true
, "redundancy.misc": true
, "redundancy.ras_syndrome": true
, "skunked_terms.misc": true
, "spelling.able_atable": true
, "spelling.able_ible": true
, "spelling.athletes": false
, "spelling.em_im_en_in": true
, "spelling.er_or": true
, "spelling.in_un": true
, "spelling.misc": true
, "security.credit_card": true
, "security.password": true
, "sexism.misc": true
, "terms.animal_adjectives": true
, "terms.eponymous_adjectives": true
, "terms.venery": true
, "typography.diacritical_marks": true
, "typography.exclamation": true
, "typography.symbols": false
, "uncomparables.misc": true
, "weasel_words.misc": true
, "weasel_words.very": true
}
}

.sqlfluff the famous sql linter. Pay attention, this is only for sql files outside of dbt. Anything regarding dbt is ignored, because dbt needs the same files, but slightly different. I will show it later. Configure sql code as you like. For example, use always leading comma or every table, view and column must be in lower case. Important: define your database on the top. In my case it’s databricks.

[sqlfluff]
dialect = databricks
max_line_length = 120

[sqlfluff:indentation]
tab_space_size = 4
indent_unit = space
indented_joins = false
indented_using_on = false
allow_implicit_indents = True

[sqlfluff:rules:aliasing.table]
aliasing.table = explicit

[sqlfluff:rules:aliasing.column]
aliasing.column = explicit

[sqlfluff:rules:aliasing.expression]
allow_scalar = True

[sqlfluff:rules:ambiguous.column_references]
group_by_and_order_by_style = explicit

[sqlfluff:rules:capitalisation.keywords]
capitalisation_policy = lower

[sqlfluff:rules:capitalisation.identifiers] # Tables, columns, views
extended_capitalisation_policy = lower

[sqlfluff:rules:capitalisation.functions] # Function names
capitalisation_policy = lower

[sqlfluff:rules:capitalisation.literals] # Null & Boolean Literals
capitalisation_policy = lower

[sqlfluff:rules:capitalisation.types] # datatypes
extended_capitalisation_policy = lower

[sqlfluff:rules:layout.spacing] # removal of trailing whitespace
no_trailing_whitespace = true
extra_whitespace = true

[sqlfluff:rules:layout.commas] # Leading comma enforcement
line_position = leading

[sqlfluff:layout:type:comma] # Added with conjunction with leading commas
line_position = leading

.tflint.hcl in this case only set a required version.

tflint {
required_version = ">= 0.54"
}

.yamllint.yml Also just a few settings. Especially the allowed length of a row.

extends: default
rules:
new-lines:
level: warning
type: unix
line-length:
max: 500
document-start:
present: false
comments:
min-spaces-from-content: 1 # Used to follow prettier standard: https://github.com/prettier/prettier/pull/10926

Linters outside of MegaLinter

MegaLinter cannot incorporate all existing linters. However, they offer a solution for extending it with a custom linter of your choice (though I haven’t tested it — maybe in the future).

I created my own Dockerfile and a simple shell script to use instead. The first leverages sqlfluff for dbt, while the second ensures Git branch names follow specific rules.

Let’s start with sqlfluff for dbt. The configuration file, .sqlfluff, remains the same as before, with a slight adjustment at the beginning:

[sqlfluff]
dialect = databricks
templater = dbt
max_line_length = 150

This file must now be stored in the root of your dbt project, rather than the root of the repository. Additionally, if you need to exclude certain directories or files from linting, place a .sqlfluffignore file in the same location.

The needed docker file:

FROM python:3.13-slim

WORKDIR /app

COPY pyproject.toml uv.lock ./docker/entrypoint.sh ./

RUN apt-get update \
&& apt-get install build-essential=12.9 --no-install-recommends -y \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* \
&& useradd -ms /bin/bash dbt-generic-user \
&& pip install uv==0.6.9 --no-cache-dir \
&& uv sync \
&& chown -R dbt-generic-user:dbt-generic-user /app \
&& chmod +x /app/.venv/bin/activate \
&& chmod +x /app/entrypoint.sh

USER dbt-generic-user

HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 CMD ["uv", "--version"] || exit 1

ENTRYPOINT ["/app/entrypoint.sh"]
CMD ["lint"]

Additionally, the project file pyproject.toml is required, along with entrypoint.sh. The latter is stored in the docker subdirectory alongside the Dockerfile, while the project file is placed in the root of the repository.

The project file:

[project]
name = "linters"
version = "0.4.0"
description = "Some text"
readme = "README.md"
requires-python = ">=3.11"
dependencies = [
"dbt-core ~= 1.9",
"dbt-databricks ~= 1.9",
"pytest ~= 8.3",
"python-dotenv ~= 1.0",
"sqlfluff-templater-dbt ~= 3.3.0",
]

The shell script entrypoint:

#!/bin/bash

shellcheck disable=SC1091

source .venv/bin/activate
cd /app/dwh || exit 1

echo "Running SQLFluff..."
if ! sqlfluff "$@"; then
echo "sqlfluff failed. Exiting."
exit 2
fi

echo "Linting completed successfully."

Please note that sqlfluff will establish a connection to your database, requiring a functional dbt profile. If you run the linter locally, you can use your default profile. However, for a CI/CD process, you must specify a profile—such as profiles-pipeline.yml—configured with environment variables.

dwh:
outputs:
prd:
catalog: bronze
host:
http_path:
schema: default
threads: 4
auth_type: oauth
type: databricks
client_id: "{{ env_var('DBT_ENV_SECRET_DATABRICKS_CLIENT_ID') }}"
client_secret: "{{ env_var('DBT_ENV_SECRET_DATABRICKS_CLIENT_SECRET') }}"
target: prd

Locally the docker can be executed:

docker run --rm -v ./dwh:/app/dwh -v ~/.dbt/profiles.yml:/home/dbt-generic-user/.dbt/profiles.yml sqlfluff-dbt-linter

To run automated fixes, simply extend the command with fix.

In a pipeline, make sure to update your home path to match the location of the profile, which must be stored in your repository.

Another linter checks Git branch names. Personally, I prefer branches to follow a specific pattern — starting with a prefix like feature, fix, or test, followed by a slash and then a descriptive purpose, ideally including a ticket number.

For example: feature/db-123-financial-data.

My file check_git_branch_name.sh stored within the directory .linters :

#!/bin/bash

Define color codes

RED='\033[0;31m'
GREEN='\033[0;32m'
NC='\033[0m' # No Color

Unicode symbols

CHECK_MARK="${GREEN}✔${NC}"
CROSS_MARK="${RED}✖${NC}"

Get the current branch name

BRANCH_NAME=${1:-$(git rev-parse --abbrev-ref HEAD)}

Define the branch name pattern

Check the branch name against the pattern

if [[ ! BRANCHNAME= BRANCH_NAME =~ BRANCHNAME= BRANCH_NAME_REGEX ]]; then
echo -e "\n\n${CROSS_MARK} REDError:{RED}Error:REDError:{NC} Branch name '${BRANCH_NAME}' does not follow the naming convention.\n"
echo -e "${RED}Branch names must match the pattern:${NC} $BRANCH_NAME_REGEX"
echo -e "${RED}Do you use feature/fix/hotfix/chore/refactor/test/docs as a prefix?${NC}"
echo -e "${RED}Do you use only lowercase letters, numbers, dots, and hyphens in the branch name?${NC}"
echo -e "${RED}You can rename your branch by running:${NC} git branch -m \n\n"
exit 1
fi

echo -e "\n\n${CHECK_MARK} GREENBranchname′{GREEN}Branch name 'GREENBranchname′{BRANCH_NAME}' is valid.${NC}\n\n"

Execution

I’ve already explained how to execute sqlfluff with dbt using Docker, and running a shell script should be straightforward.

MegaLinter requires npx and can be executed with:

npx mega-linter-runner

I recommend consolidating all these commands into a single Makefile or shell script. You can also extend it by adding the fix option to allow linters to automatically resolve any issues they can.

For example:

lint: ## Lints the code using sqlfluff
@docker run --rm -v ./dwh:/app/dwh -v ~/.dbt/profiles.yml:/home/dbt-generic-user/.dbt/profiles.yml sqlfluff-dbt-linter
@sh ./.linters/check_git_branch_name.sh
@npx mega-linter-runner

In an Azure DevOps pipeline, it could look like this:

trigger: none

pr:
branches:
include:
- master

pool:
vmImage: ubuntu-latest

steps:

Checkout triggering repo

checkout: self
displayName: Checkout Triggering Repository
persistCredentials: "true"
fetchDepth: "0"
script: |
git fetch origin $(System.PullRequest.SourceBranch):source_branch
git checkout source_branch

Strip 'refs/heads/' prefix to get only the branch name
CLEAN_BRANCH_NAME=$(echo "$(System.PullRequest.SourceBranch)" | sed 's|refs/heads/||')
echo "Clean branch name: $CLEAN_BRANCH_NAME"

bash .linters/check_git_branch_name.sh "$CLEAN_BRANCH_NAME"
displayName: Check Git Branch Name

Build Docker image from the Dockerfile in the repository

task: Docker@2
displayName: "Build Docker Image from Repository"
inputs:
command: "build"
Dockerfile: "docker/sqlfluff-dbt-linter"
buildContext: "$(System.DefaultWorkingDirectory)"
arguments: "-t sqlfluff-dbt-linter:latest"
script: |
mkdir -p ./dwh/target
chmod -R 777 ./dwh/target
displayName: Ensure target folder permissions

Run the Docker container built from the Dockerfile

script: |
docker run --rm --env-file .docker_env_file.env \
-e DBT_ENV_SECRET_DATABRICKS_CLIENT_ID=$(DBT_ENV_SECRET_DATABRICKS_CLIENT_ID) \
-e DBT_ENV_SECRET_DATABRICKS_CLIENT_SECRET=$(DBT_ENV_SECRET_DATABRICKS_CLIENT_SECRET) \
-v $(System.DefaultWorkingDirectory)/dwh:/app/dwh \
-v $(System.DefaultWorkingDirectory)/dwh/profiles/profiles-pipeline.yml:/home/dbt-generic-user/.dbt/profiles.yml \
sqlfluff-dbt-linter
displayName: Run dbt linter

Pull MegaLinter docker image

script: docker pull oxsecurity/megalinter:v8
displayName: Pull MegaLinter

Run MegaLinter

script: |
docker run -v $(System.DefaultWorkingDirectory):/tmp/lint \
--env-file <(env | grep -e SYSTEM_ -e BUILD_ -e TF_ -e AGENT_) \
-e SYSTEM_ACCESSTOKEN=$(System.AccessToken) \
-e GIT_AUTHORIZATION_BEARER=$(System.AccessToken) \
oxsecurity/megalinter:v8
displayName: Run MegaLinter
task: Docker@2
displayName: "Build Docker Image for dbt catalog"
inputs:
command: build
Dockerfile: "docker/dbt-catalog"
buildContext: "$(System.DefaultWorkingDirectory)"
arguments: "-t dbt-catalog:latest"

Final remarks:

MegaLinter Updates: If you don’t set a fixed version for MegaLinter, you will receive updates automatically. While this can bring improvements, it might also introduce new required fixes that weren’t necessary before, potentially causing inconsistencies in your workflow.
Extensive Linter Coverage: MegaLinter includes a wide range of linters for various programming languages. Be sure to explore it and take advantage of additional linters for other types of files in your project. It can help ensure code quality across different languages and file types.
Docker Image Size: MegaLinter is quite large, and pulling the full image can be around 10 GB. While this isn’t a problem locally, as it only needs to be pulled once and Docker caching is used, in a CI/CD pipeline, it would fetch everything every time. Consider using a specific linter flavor to optimize the pipeline build time and resource usage.
Linting in a Team: Linters are valuable when working in teams, as they enforce consistent coding practices and improve overall code quality. However, they can also lead to many discussions about specific rules. Be cautious when disabling checks — some developers might be inclined to ignore formatting rules or fix issues only when necessary. It’s important to maintain discipline in adhering to standards. If you start making too many exceptions, you undermine the value of the linter itself.

By keeping these considerations in mind, you’ll have a more efficient and consistent linting process, both locally and in your CI/CD pipeline, while also fostering a culture of quality coding within your team.