Integrating MegaLinter to Automate Linting Across Multiple Codebases. A Technical Description. (original) (raw)
Working as a team on a common code basis is often challenging
If you’re not familiar with linters, or specifically with MegaLinter, please take a look at my previous article on the topic. In contrast to the previous article, this one focuses on implementing several linters using MegaLinter for Python, Docker, SQL, YAML, Bash, JSON, Markdown, Make, Terraform, the repository itself, and spellchecking. Additionally, an approach is demonstrated to implement SQLFluff not only for SQL but also for dbt. A shell script is also introduced, which tests for specific patterns in Git branch names. Finally, all of these are integrated into a pipeline on Azure DevOps for use within a CI/CD process. The following linters will be used:
- bash-exec: Checks if shell files are executable. In MegaLinter, this is referred to as
BASH_EXEC.
- shellcheck: Provides warnings and suggestions for bash/sh scripts. In MegaLinter, this is referred to as
BASH_SHELLCHECK.
- jsonlint: A JSON/CJSON/JSON5 parser, validator and pretty-printer. In MegaLinter, this is referred to as
JSON_JSONLINT.
- jsonprettier: Code formatter, enforces a consistent style. In Megalinter, this is referred to as
JSON_PRETTIER.
- jsonv8r: Check the validity of Json files, if they have a matching schema defined on schemastore.org. In Megalinter, this is referred to as
JSON_V8R.
- checkmake: Linter and analyser for Makefiles. I like it pretty much, but it’s currently disabled, due to security issues. Last update almost two years ago. In MegaLinter, this is referred to as
MAKEFILE_CHECKMAKE.
- markdownlint: Checks for errors in Markdown files, and can also auto-fix some of them. In MegaLinter, this is referred to as
MARKDOWN_MARKDOWNLINT.
- markdown-link-check: Checks all of the hyperlinks in a markdown text to determine if they are alive or dead. In MegaLinter, this is referred to as
MARKDOWN_MARKDOWN_LINK_CHECK.
- markdown-table-formatter: Check markdown tables formatting and apply fixes. In MegaLinter, this is referred to as
MARKDOWN_MARKDOWN_TABLE_FORMATTER.
- black: A python code formatter. In MegaLinter, this is referred to as
PYTHON_BLACK.
- bandit: A tool designed to find common security issues in Python code. In MegaLinter, this is referred to as
PYTHON_BANDIT.
- flake8: A python tool that glues together pycodestyle, pyflakes, mccabe, and third-party plugins to check the style and quality of some python code. In MegaLinter, this is referred to as
PYTHON_FLAKE8.
- isort: A python utility/library to sort imports alphabetically and automatically separate into sections and by type. In MegaLinter, this is referred to as
PYTHON_ISORT.
- checkov: Prevent cloud misconfigurations and find vulnerabilities during build-time in infrastructure as code, container images and open source packages. In MegaLinter, this referred to as
REPOSITORY_CHECKOV.
- gitleaks: A tool for detecting secrets like passwords, API keys, and tokens in git repos, files, and whatever else you wanna throw at it via
stdin
. In MegaLinter, this is referred to asREPOSITORY_GITLEAKS.
- kics: Find security vulnerabilities, compliance issues, and infrastructure misconfigurations early in the development cycle of your infrastructure-as-code. In MegaLinter, this is referred to as
REPOSITORY_KICS.
- ls-lint: A fast directory and filename linter. In MegaLinter, this is referred to as
REPOSITORY_LS_LINT.
- secretlint: Linting tool to prevent committing credential. In MegaLinter, this is referred to as
REPOSITORY_SECRETLINT.
- semgrep: Lightweight static analysis for many languages. Find bug variants with patterns that look like source code. In MegaLinter, this is referred to as
REPOSITORY_SEMGREP.
- trivy: Find vulnerabilities, misconfigurations, secrets, SBOM in containers, Kubernetes, code repositories, clouds and more. In MegaLinter, this is referred to as
REPOSITORY_TRIVY.
- trufflehog: Find, verify, and analyze leaked credentials. In MegaLinter, this is referred to as
REPOSITORY_TRUFFLEHOG.
- lychee: Fast, async, stream-based link checker written in Rust. Finds broken URLs and mail addresses inside Markdown, HTML, reStructuredText, websites and more. In MegaLinter, this is referred to as
SPELL_LYCHEE.
- proselint: A linter for English prose. In MegaLinter, this is referred to as
SPELL_PROSELINT.
- sqlfluff: A modular SQL linter and auto-formatter with support for multiple dialects and templated code. Used also for dbt. In MegaLinter, this is referred to as
SQL_SQLFLUFF.
- terragrunt: A flexible orchestration tool that allows Infrastructure as Code written in OpenTofu/Terraform to scale. In MegaLinter, this is referred to as
TERRAFORM_TERRAGRUNT.
- tflint: A pluggable Terraform Linter. In MegaLinter, this is referred to as
TERRAFORM_TFLINT.
- yamlprettier: Look for the jsonprettier above. Same tool, just for yaml. In MegaLinter, this is referred to as
YAML_PRETTIER.
- yamlv8r: Look for jsonv8r. Same tool, just for yaml. In MegaLinter, this is referred to as
YAML_V8R.
Set a MegaLinter configuration file
In the root of your repository, create a file named .mega-linter.yml
. This file contains the configuration for MegaLinter itself and for many, but not all, of the linters. My file:
Configuration file for MegaLinter
See all available variables at https://megalinter.io/configuration/
and in linters documentation
APPLY_FIXES: none # all, none, or list of linter keys
BASH_SHELLCHECK_ARGUMENTS: -e "SC2162"
BASH_EXEC_FILTER_REGEX_EXCLUDE: "dbt_packages"
BASH_SHELLCHECK_FILTER_REGEX_EXCLUDE: "dbt_packages"
CLEAR_REPORT_FOLDER: true
DISABLE_ERRORS: false
ENABLE_LINTERS:
- BASH_EXEC
- BASH_SHELLCHECK
- JSON_JSONLINT
- JSON_PRETTIER
- JSON_V8R
- MAKEFILE_CHECKMAKE
- MARKDOWN_MARKDOWNLINT
- MARKDOWN_MARKDOWN_LINK_CHECK
- MARKDOWN_MARKDOWN_TABLE_FORMATTER
- PYTHON_BLACK
- PYTHON_BANDIT
- PYTHON_FLAKE8
- PYTHON_ISORT
- REPOSITORY_CHECKOV
- REPOSITORY_GITLEAKS
- REPOSITORY_KICS
- REPOSITORY_LS_LINT
- REPOSITORY_SECRETLINT
- REPOSITORY_SEMGREP
- REPOSITORY_TRIVY
- REPOSITORY_TRUFFLEHOG
- SPELL_LYCHEE
- SPELL_PROSELINT
- SQL_SQLFLUFF
- TERRAFORM_TERRAGRUNT
- TERRAFORM_TFLINT
- YAML_PRETTIER
- YAML_V8R
FAIL_IF_UPDATED_SOURCES: false
FILEIO_REPORTER: false
FILTER_REGEX_EXCLUDE: none
FLAVOR_SUGGESTIONS: false
FORMATTERS_DISABLE_ERRORS: false
IGNORE_GITIGNORED_FILES: true
LINTER_RULES_PATH: .linters # Directory for all linter configuration rules.
LOG_LEVEL: INFO
MARKDOWN_DEFAULT_STYLE: markdownlint
MARKDOWN_MARKDOWN_LINK_CHECK_FILTER_REGEX_EXCLUDE: "dbt_packages"
MARKDOWN_MARKDOWN_LINK_CHECK_RULES_PATH: .linters
MARKDOWN_MARKDOWNLINT_FILTER_REGEX_EXCLUDE: "dbt_packages"
MARKDOWN_MARKDOWN_TABLE_FORMATTER_FILTER_REGEX_EXCLUDE: "dbt_packages"
PARALLEL: true
PRINT_ALPACA: false
PYTHON_BANDIT_RULES_PATH: .linters
PYTHON_BANDIT_CONFIG_FILE: .bandit.yml
PYTHON_BANDIT_FILTER_REGEX_EXCLUDE: "dbt_packages"
PYTHON_BLACK_FILTER_REGEX_EXCLUDE: "dbt_packages"
PYTHON_FLAKE8_FILTER_REGEX_EXCLUDE: "dbt_packages"
PYTHON_FLAKE8_RULES_PATH: .linters
PYTHON_ISORT_FILTER_REGEX_EXCLUDE: "dbt_packages"
REPOSITORY_LS_LINT_RULES_PATH: .linters
REPOSITORY_SEMGREP_RULESETS:
[
"p/comment",
"p/cwe-top-25",
"p/docker-compose",
"p/dockerfile",
"p/owasp-top-ten",
"p/python",
"p/r2c-security-audit",
"p/secure-defaults",
"p/terraform",
]
SHOW_ELAPSED_TIME: true
SHOW_SKIPPED_LINTERS: false
SPELL_LYCHEE_FILTER_REGEX_EXCLUDE: "dbt_packages"
SPELL_PROSELINT_FILTER_REGEX_EXCLUDE: "dbt_packages"
SPELL_PROSELINT_RULES_PATH: .linters
SPELL_VALE_RULES_PATH: .linters
SQL_SQLFLUFF_CONFIG_FILE: .sqlfluff
SQL_SQLFLUFF_RULES_PATH: .linters
YAML_PRETTIER_FILTER_REGEX_EXCLUDE: "dbt_packages"
YAML_V8R_FILTER_REGEX_EXCLUDE: "dbt_packages"
Allow me to explain my settings shortly:
- APPLY_FIXES: MegaLinter can automatically fix your code as defined in the linter rules. I don’t want that automatically, thus I decided to use “none”.
- BASH_SHELLCHECK_ARGUMENTS: With “-e” a rule is excluded. In this case rule “SC2162”.
- BASH_EXEC_FILTER_REGEX_EXCLUDE: Any files in the “dbt_packages” directory will be ignored by the
bash_exec
linter. Ignoring this directory is necessary because it contains other Git repositories with files that might violate certain rules, often too many to fix comprehensively. - BASH_SHELLCHECK_FILTER_REGEX_EXCLUDE: “dbt_packages”. Same as described above.
- CLEAR_REPORT_FOLDER: Flag to clear files from report folder (usually megalinter-reports) before starting the linting process. I don’t need to keep these reports, so I set it to
true.
- DISABLE_ERRORS: Flag to have the linter complete with exit code 0 even if errors were detected. By default, it is set to
false
, but I explicitly define it. After all, what’s the point of a linter if you disable errors? - ENABLE_LINTERS: A list of all linters used in this repository. For an explanation, see above. Every linter you want to use must be specified here. Instead of enabling specific linters, you can also use
DISABLE_LINTERS
to enable all linters except those explicitly mentioned. - FAIL_IF_UPDATED_SOURCES: If set to
true
, MegaLinter fails if a linter or formatter has autofixed sources, even if there are no errors. - FILEIO_REPORTER: Upload MegaLinter reports to file.io if set to
true
- FILTER_REGEX_EXCLUDE: Regular expression defining which files will be excluded from linting. Probably better to set exclusions for specific linters.
- FLAVOR_SUGGESTIONS: The main drawback of MegaLinter is its heavy resource usage. While caching minimizes the impact locally, in a CI/CD pipeline, execution can take significant time. If you’re charged per execution time rather than a flat fee, this can add to your costs — though likely minimal compared to other expenses. However, not all linters are necessary for every project. To address this, MegaLinter offers a recommended configuration, which you can enable by setting it to
true
. - FORMATTERS_DISABLE_ERRORS: Some linters may display warnings or errors if a file is improperly formatted. I recommend always setting this to
false
. - IGNORE_GITIGNORED_FILES: The recommended and default setting is
true
, meaning anything ignored by Git due to the.gitignore
file is also ignored by MegaLinter. - LINTER_RULES_PATH: Some linters require configuration files to be stored at the root of the repository. However, many allow using a custom directory. Create a directory of your choice and specify it here.
- LOG_LEVEL: How much output the script will generate to the console. One of
INFO
,DEBUG
,WARNING
orERROR
. - MARKDOWN_DEFAULT_STYLE: Markdown default style to check/apply.
markdownlint
,remark-lint.
- MARKDOWN_MARKDOWN_LINK_CHECK_FILTER_REGEX_EXCLUDE: Linter Markdown_link_check excludes anything in directory “dbt_packages”.
In the following, any linter that simply searches for its configuration file in the .linter
directory will not be described. The same applies to cases where only a specific directory is ignored from linting.
- PARALLEL: Process linters in parallel to improve overall MegaLinter performance. If true, linters of same language or formats are grouped in the same parallel process to avoid lock issues if fixing the same files.
- PRINT_ALPACA: Enable printing alpaca image to console.
- REPOSITORY_SEMGREP_RULESETS: List of rules which are included. Find all available rules here.
- SHOW_ELAPSED_TIME: Displays elapsed time in reports.
- SHOW_SKIPPED_LINTERS: Displays all disabled linters mega-linter could have run. Please note, from time to time MegaLinter disable certain linters, due to bugs.
Set Linter Configuration
Some linters can be configured within the .mega-linter.yml
file, while others require separate configuration files, which can often be placed in a custom directory—for example, .linters
in my case. However, some linters strictly require their configuration files to be at the root of the repository. Let's start with those. Note that these are not configuration files, but ignore files: .semigrepignore
, .sqlfluffignore
, and .trivyignore
.
My configuration files within directory .linters
are:
.bandit.yml
which exclude certain directories and skips rule B101.
#FILE: bandit.yml
exclude_dirs: ["venv", "megalinter-reports", "dbt_packages"]
#tests: ['B201', 'B301']
skips: ["B101"]
.checkov.yml
which just ignores directory dbt_packages
skip-path:
- /dwh/dbt_packages
.flake8
the famous python linter, ignoring rules E501 and F821 as well as ignoring two files.
[flake8]
extend-ignore = E501, F821
exclude =
test_copy_files.py,
pycache
kics.config
is set to check Docker and Terraform files, with the LTS Spark version check ignored.
verbose: true
type:
- Dockerfile
- Terraform
log-level: INFO
exclude-queries: - 5a627dfa-a4dd-4020-a4c6-5f3caf4abcd6 # Beta - Check use no LTS Spark Version --> Ignore check for LTS version in Spark
.ls-lint.yml
set for certain file formats rules. In my case file names must always be in lower cases or snake cases. Some directories are excluded.
ls:
.sql: lowercase | snake_case
.py: lowercase | snake_case
.tf: lowercase | snake_case
.sh: lowercase | snake_case
.yml: lowercase | snake_case
.json: lowercase | snake_case
.txt: lowercase | snake_case
.hcl: lowercase | snake_case
.toml: lowercase | snake_case
.config: lowercase | snake_case
.env: lowercase | snake_case
ignore:
- .git
- dwh/dbt_packages
- dwh/target
- terraform/workspaces/.terraform
- .venv
lychee.toml
accept response code 200 and 429. Moreover, ignore certain urls, files and pathes from check.
Accepts log level: "error", "warn", "info", "debug", "trace"
verbose = "info"
Don't show interactive progress bar while checking links.
no_progress = true
accept = ["200", "429"]
exclude = [
"https://megalinter.io/configuration/",
"file:///tmp/lint/dwh/models/logo.png"
]
exclude_path = [
"logs",
"megalinter-reports",
".venv",
"dbt_packages"
]
.markdown-link-check.json
just ignore certain patterns and status codes.
{
"ignorePatterns": [
{
"pattern": "logo.png"
}
],
"retryOn429": true,
"retryCount": 5,
"aliveStatusCodes": [0, 200, 203, 404]
}
.markdownlint.json
enables or disables certain rules.
{
"MD004": false,
"MD007": {
"indent": 2
},
"MD013": {
"line_length": 500,
"code_blocks": false
},
"MD026": {
"punctuation": ".,;:!。,;:"
},
"MD029": false,
"MD033": false,
"MD036": false,
"blank_lines": false,
"MD041": false
}
.proselintrc
disable or enable rules.
{
"checks": {
"airlinese.misc": false
, "annotations.misc": false
, "archaism.misc": false
, "cliches.hell": true
, "cliches.misc": true
, "consistency.spacing": true
, "consistency.spelling": true
, "corporate_speak.misc": true
, "cursing.filth": false
, "cursing.nfl": false
, "dates_times.am_pm": true
, "dates_times.dates": true
, "hedging.misc": true
, "hyperbole.misc": true
, "jargon.misc": false
, "lgbtq.offensive_terms": true
, "lgbtq.terms": true
, "lexical_illusions.misc": false
, "links.broken": true
, "malapropisms.misc": true
, "misc.apologizing": true
, "misc.back_formations": true
, "misc.bureaucratese": true
, "misc.but": true
, "misc.capitalization": true
, "misc.chatspeak": true
, "misc.commercialese": true
, "misc.currency": true
, "misc.debased": true
, "misc.false_plurals": true
, "misc.illogic": true
, "misc.inferior_superior": true
, "misc.latin": true
, "misc.many_a": true
, "misc.metaconcepts": true
, "misc.narcissism": true
, "misc.phrasal_adjectives": false
, "misc.preferred_forms": true
, "misc.pretension": true
, "misc.professions": true
, "misc.punctuation": true
, "misc.scare_quotes": true
, "misc.suddenly": true
, "misc.tense_present": true
, "misc.waxed": true
, "misc.whence": true
, "mixed_metaphors.misc": true
, "mondegreens.misc": true
, "needless_variants.misc": true
, "nonwords.misc": true
, "oxymorons.misc": true
, "psychology.misc": true
, "redundancy.misc": true
, "redundancy.ras_syndrome": true
, "skunked_terms.misc": true
, "spelling.able_atable": true
, "spelling.able_ible": true
, "spelling.athletes": false
, "spelling.em_im_en_in": true
, "spelling.er_or": true
, "spelling.in_un": true
, "spelling.misc": true
, "security.credit_card": true
, "security.password": true
, "sexism.misc": true
, "terms.animal_adjectives": true
, "terms.eponymous_adjectives": true
, "terms.venery": true
, "typography.diacritical_marks": true
, "typography.exclamation": true
, "typography.symbols": false
, "uncomparables.misc": true
, "weasel_words.misc": true
, "weasel_words.very": true
}
}
.sqlfluff
the famous sql linter. Pay attention, this is only for sql files outside of dbt. Anything regarding dbt is ignored, because dbt needs the same files, but slightly different. I will show it later. Configure sql code as you like. For example, use always leading comma or every table, view and column must be in lower case. Important: define your database on the top. In my case it’s databricks.
[sqlfluff]
dialect = databricks
max_line_length = 120
[sqlfluff:indentation]
tab_space_size = 4
indent_unit = space
indented_joins = false
indented_using_on = false
allow_implicit_indents = True
[sqlfluff:rules:aliasing.table]
aliasing.table = explicit
[sqlfluff:rules:aliasing.column]
aliasing.column = explicit
[sqlfluff:rules:aliasing.expression]
allow_scalar = True
[sqlfluff:rules:ambiguous.column_references]
group_by_and_order_by_style = explicit
[sqlfluff:rules:capitalisation.keywords]
capitalisation_policy = lower
[sqlfluff:rules:capitalisation.identifiers] # Tables, columns, views
extended_capitalisation_policy = lower
[sqlfluff:rules:capitalisation.functions] # Function names
capitalisation_policy = lower
[sqlfluff:rules:capitalisation.literals] # Null & Boolean Literals
capitalisation_policy = lower
[sqlfluff:rules:capitalisation.types] # datatypes
extended_capitalisation_policy = lower
[sqlfluff:rules:layout.spacing] # removal of trailing whitespace
no_trailing_whitespace = true
extra_whitespace = true
[sqlfluff:rules:layout.commas] # Leading comma enforcement
line_position = leading
[sqlfluff:layout:type:comma] # Added with conjunction with leading commas
line_position = leading
.tflint.hcl
in this case only set a required version.
tflint {
required_version = ">= 0.54"
}
.yamllint.yml
Also just a few settings. Especially the allowed length of a row.
extends: default
rules:
new-lines:
level: warning
type: unix
line-length:
max: 500
document-start:
present: false
comments:
min-spaces-from-content: 1 # Used to follow prettier standard: https://github.com/prettier/prettier/pull/10926
Linters outside of MegaLinter
MegaLinter cannot incorporate all existing linters. However, they offer a solution for extending it with a custom linter of your choice (though I haven’t tested it — maybe in the future).
I created my own Dockerfile and a simple shell script to use instead. The first leverages sqlfluff
for dbt, while the second ensures Git branch names follow specific rules.
Let’s start with sqlfluff
for dbt. The configuration file, .sqlfluff
, remains the same as before, with a slight adjustment at the beginning:
[sqlfluff]
dialect = databricks
templater = dbt
max_line_length = 150
This file must now be stored in the root of your dbt project, rather than the root of the repository. Additionally, if you need to exclude certain directories or files from linting, place a .sqlfluffignore
file in the same location.
The needed docker file:
FROM python:3.13-slim
WORKDIR /app
COPY pyproject.toml uv.lock ./docker/entrypoint.sh ./
RUN apt-get update \
&& apt-get install build-essential=12.9 --no-install-recommends -y \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* \
&& useradd -ms /bin/bash dbt-generic-user \
&& pip install uv==0.6.9 --no-cache-dir \
&& uv sync \
&& chown -R dbt-generic-user:dbt-generic-user /app \
&& chmod +x /app/.venv/bin/activate \
&& chmod +x /app/entrypoint.sh
USER dbt-generic-user
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 CMD ["uv", "--version"] || exit 1
ENTRYPOINT ["/app/entrypoint.sh"]
CMD ["lint"]
Additionally, the project file pyproject.toml
is required, along with entrypoint.sh
. The latter is stored in the docker
subdirectory alongside the Dockerfile, while the project file is placed in the root of the repository.
The project file:
[project]
name = "linters"
version = "0.4.0"
description = "Some text"
readme = "README.md"
requires-python = ">=3.11"
dependencies = [
"dbt-core ~= 1.9",
"dbt-databricks ~= 1.9",
"pytest ~= 8.3",
"python-dotenv ~= 1.0",
"sqlfluff-templater-dbt ~= 3.3.0",
]
The shell script entrypoint:
#!/bin/bash
shellcheck disable=SC1091
source .venv/bin/activate
cd /app/dwh || exit 1
echo "Running SQLFluff..."
if ! sqlfluff "$@"; then
echo "sqlfluff failed. Exiting."
exit 2
fi
echo "Linting completed successfully."
Please note that sqlfluff
will establish a connection to your database, requiring a functional dbt profile. If you run the linter locally, you can use your default profile. However, for a CI/CD process, you must specify a profile—such as profiles-pipeline.yml
—configured with environment variables.
dwh:
outputs:
prd:
catalog: bronze
host:
http_path:
schema: default
threads: 4
auth_type: oauth
type: databricks
client_id: "{{ env_var('DBT_ENV_SECRET_DATABRICKS_CLIENT_ID') }}"
client_secret: "{{ env_var('DBT_ENV_SECRET_DATABRICKS_CLIENT_SECRET') }}"
target: prd
Locally the docker can be executed:
docker run --rm -v ./dwh:/app/dwh -v ~/.dbt/profiles.yml:/home/dbt-generic-user/.dbt/profiles.yml sqlfluff-dbt-linter
To run automated fixes, simply extend the command with fix
.
In a pipeline, make sure to update your home path to match the location of the profile, which must be stored in your repository.
Another linter checks Git branch names. Personally, I prefer branches to follow a specific pattern — starting with a prefix like feature
, fix
, or test
, followed by a slash and then a descriptive purpose, ideally including a ticket number.
For example: feature/db-123-financial-data
.
My file check_git_branch_name.sh
stored within the directory .linters
:
#!/bin/bash
Define color codes
RED='\033[0;31m'
GREEN='\033[0;32m'
NC='\033[0m' # No Color
Unicode symbols
CHECK_MARK="${GREEN}✔${NC}"
CROSS_MARK="${RED}✖${NC}"
Get the current branch name
BRANCH_NAME=${1:-$(git rev-parse --abbrev-ref HEAD)}
Define the branch name pattern
BRANCH_NAME_REGEX="^(feature|fix|hotfix|chore|refactor|test|docs)/[a-z0-9._-]+$"
Check the branch name against the pattern
if [[ ! BRANCHNAME= BRANCH_NAME =~ BRANCHNAME= BRANCH_NAME_REGEX ]]; then
echo -e "\n\n${CROSS_MARK} REDError:{RED}Error:REDError:{NC} Branch name '${BRANCH_NAME}' does not follow the naming convention.\n"
echo -e "${RED}Branch names must match the pattern:${NC} $BRANCH_NAME_REGEX"
echo -e "${RED}Do you use feature/fix/hotfix/chore/refactor/test/docs as a prefix?${NC}"
echo -e "${RED}Do you use only lowercase letters, numbers, dots, and hyphens in the branch name?${NC}"
echo -e "${RED}You can rename your branch by running:${NC} git branch -m \n\n"
exit 1
fi
echo -e "\n\n${CHECK_MARK} GREENBranchname′{GREEN}Branch name 'GREENBranchname′{BRANCH_NAME}' is valid.${NC}\n\n"
Execution
I’ve already explained how to execute sqlfluff
with dbt using Docker, and running a shell script should be straightforward.
MegaLinter requires npx
and can be executed with:
npx mega-linter-runner
I recommend consolidating all these commands into a single Makefile or shell script. You can also extend it by adding the fix
option to allow linters to automatically resolve any issues they can.
For example:
lint: ## Lints the code using sqlfluff
@docker run --rm -v ./dwh:/app/dwh -v ~/.dbt/profiles.yml:/home/dbt-generic-user/.dbt/profiles.yml sqlfluff-dbt-linter
@sh ./.linters/check_git_branch_name.sh
@npx mega-linter-runner
In an Azure DevOps pipeline, it could look like this:
trigger: none
pr:
branches:
include:
- master
pool:
vmImage: ubuntu-latest
steps:
Checkout triggering repo
checkout: self
displayName: Checkout Triggering Repository
persistCredentials: "true"
fetchDepth: "0"script: |
git fetch origin $(System.PullRequest.SourceBranch):source_branch
git checkout source_branchStrip 'refs/heads/' prefix to get only the branch name
CLEAN_BRANCH_NAME=$(echo "$(System.PullRequest.SourceBranch)" | sed 's|refs/heads/||')
echo "Clean branch name: $CLEAN_BRANCH_NAME"bash .linters/check_git_branch_name.sh "$CLEAN_BRANCH_NAME"
displayName: Check Git Branch Name
Build Docker image from the Dockerfile in the repository
- task: Docker@2
displayName: "Build Docker Image from Repository"
inputs:
command: "build"
Dockerfile: "docker/sqlfluff-dbt-linter"
buildContext: "$(System.DefaultWorkingDirectory)"
arguments: "-t sqlfluff-dbt-linter:latest" - script: |
mkdir -p ./dwh/target
chmod -R 777 ./dwh/target
displayName: Ensure target folder permissions
Run the Docker container built from the Dockerfile
- script: |
docker run --rm --env-file .docker_env_file.env \
-e DBT_ENV_SECRET_DATABRICKS_CLIENT_ID=$(DBT_ENV_SECRET_DATABRICKS_CLIENT_ID) \
-e DBT_ENV_SECRET_DATABRICKS_CLIENT_SECRET=$(DBT_ENV_SECRET_DATABRICKS_CLIENT_SECRET) \
-v $(System.DefaultWorkingDirectory)/dwh:/app/dwh \
-v $(System.DefaultWorkingDirectory)/dwh/profiles/profiles-pipeline.yml:/home/dbt-generic-user/.dbt/profiles.yml \
sqlfluff-dbt-linter
displayName: Run dbt linter
Pull MegaLinter docker image
- script: docker pull oxsecurity/megalinter:v8
displayName: Pull MegaLinter
Run MegaLinter
script: |
docker run -v $(System.DefaultWorkingDirectory):/tmp/lint \
--env-file <(env | grep -e SYSTEM_ -e BUILD_ -e TF_ -e AGENT_) \
-e SYSTEM_ACCESSTOKEN=$(System.AccessToken) \
-e GIT_AUTHORIZATION_BEARER=$(System.AccessToken) \
oxsecurity/megalinter:v8
displayName: Run MegaLintertask: Docker@2
displayName: "Build Docker Image for dbt catalog"
inputs:
command: build
Dockerfile: "docker/dbt-catalog"
buildContext: "$(System.DefaultWorkingDirectory)"
arguments: "-t dbt-catalog:latest"
Final remarks:
- MegaLinter Updates: If you don’t set a fixed version for MegaLinter, you will receive updates automatically. While this can bring improvements, it might also introduce new required fixes that weren’t necessary before, potentially causing inconsistencies in your workflow.
- Extensive Linter Coverage: MegaLinter includes a wide range of linters for various programming languages. Be sure to explore it and take advantage of additional linters for other types of files in your project. It can help ensure code quality across different languages and file types.
- Docker Image Size: MegaLinter is quite large, and pulling the full image can be around 10 GB. While this isn’t a problem locally, as it only needs to be pulled once and Docker caching is used, in a CI/CD pipeline, it would fetch everything every time. Consider using a specific linter flavor to optimize the pipeline build time and resource usage.
- Linting in a Team: Linters are valuable when working in teams, as they enforce consistent coding practices and improve overall code quality. However, they can also lead to many discussions about specific rules. Be cautious when disabling checks — some developers might be inclined to ignore formatting rules or fix issues only when necessary. It’s important to maintain discipline in adhering to standards. If you start making too many exceptions, you undermine the value of the linter itself.
By keeping these considerations in mind, you’ll have a more efficient and consistent linting process, both locally and in your CI/CD pipeline, while also fostering a culture of quality coding within your team.