fix(builder): strip UTF-8 BOM from .ino sources before preprocessing by ritesh006 · Pull Request #2983 · arduino/arduino-cli (original) (raw)
Arduino CLI: Strip UTF‑8 BOM from .ino before preprocessing
Summary
When a sketch .ino is saved as UTF-8 with BOM, the three BOM bytes (EF BB BF) reach the compiler and cause:
stray '\357' in program
stray '\273' in program
stray '\277' in program
This PR strips the BOM at read-time so the merged .cpp and any copied sources are clean.
Refs: #3015
Please check if the PR fulfills these requirements
- The PR has no duplicates (please search among the Pull Requests before creating one)
- The PR follows our contributing guidelines
- Tests for the changes have been added (for bug fixes / features)
- Docs have been added / updated (for bug fixes / features)
UPGRADING.mdhas been updated with a migration guide (for breaking changes)configuration.schema.jsonupdated if new parameters are added.
What kind of change does this PR introduce?
Bug fix — make the CLI robust to UTF-8 BOM at the start of .ino and additional files.
What is the current behavior?
- If a
.inois saved as UTF-8 with BOM, the BOM bytes are preserved into the merged.cpp, leading to compiler errors (stray '\357' / '\273' / '\277'). - This matches IDE issue arduino/arduino-ide#2752 and appears “random” to users because some editors silently add a BOM; a blank line after an initial block comment makes it easy to reproduce.
What is the new behavior?
- On reading sketch sources:
- Strip a leading UTF-8 BOM before merging
.inofiles. - Strip a leading UTF-8 BOM when copying additional files.
- Strip a leading UTF-8 BOM before merging
- Result: BOM-prefixed sketches compile successfully. No behavior change for normal UTF-8 (no BOM) files.
Implementation notes
- Added helper:
func stripUTF8BOM(b []byte) []byte { if len(b) >= 3 && b[0] == 0xEF && b[1] == 0xBB && b[2] == 0xBF { return b[3:] } return b }
- Applied in:
internal/arduino/builder/sketch.go→sketchMergeSources()(viagetSource(...))internal/arduino/builder/sketch.go→sketchCopyAdditionalFiles(...)
Test plan (manual)
- Create a minimal sketch:
/* test */
int x = 42; void setup(){ Serial.begin(9600); } void loop(){ Serial.println(x); delay(1000); }
- Save with BOM (VS Code → Save with Encoding → UTF-8 with BOM).
- Compile:
arduino-cli compile -b arduino:avr:uno
Before this patch: fails with:
stray '\357' in program
stray '\273' in program
stray '\277' in program
After this patch: succeeds.
Control: Save as UTF-8 (no BOM) → succeeds (unchanged).
(Optional follow-up): add an automated test by placing a BOM-prefixed
.inoin testdata and asserting the merged output compiles.
Does this PR introduce a breaking change?
No. The change only strips a BOM if present; no impact on existing UTF-8 (no BOM) files or other encodings.
Other information
- The issue was reported in the IDE repo, but the root cause is in the CLI merge/preprocess path. Fixing it here resolves the problem for the IDE once it bundles a CLI containing this patch.
- Performance/overhead is negligible (constant-time 3-byte check per file).