fix(builder): strip UTF-8 BOM from .ino sources before preprocessing by ritesh006 · Pull Request #2983 · arduino/arduino-cli (original) (raw)

Arduino CLI: Strip UTF‑8 BOM from .ino before preprocessing

Summary

When a sketch .ino is saved as UTF-8 with BOM, the three BOM bytes (EF BB BF) reach the compiler and cause:

stray '\357' in program
stray '\273' in program
stray '\277' in program

This PR strips the BOM at read-time so the merged .cpp and any copied sources are clean.

Refs: #3015


Please check if the PR fulfills these requirements

See how to contribute


What kind of change does this PR introduce?

Bug fix — make the CLI robust to UTF-8 BOM at the start of .ino and additional files.


What is the current behavior?


What is the new behavior?


Implementation notes

func stripUTF8BOM(b []byte) []byte { if len(b) >= 3 && b[0] == 0xEF && b[1] == 0xBB && b[2] == 0xBF { return b[3:] } return b }


Test plan (manual)

  1. Create a minimal sketch:

/* test */

int x = 42; void setup(){ Serial.begin(9600); } void loop(){ Serial.println(x); delay(1000); }

  1. Save with BOM (VS Code → Save with Encoding → UTF-8 with BOM).
  2. Compile:

arduino-cli compile -b arduino:avr:uno

Before this patch: fails with:

stray '\357' in program
stray '\273' in program
stray '\277' in program

After this patch: succeeds.

Control: Save as UTF-8 (no BOM) → succeeds (unchanged).

(Optional follow-up): add an automated test by placing a BOM-prefixed .ino in testdata and asserting the merged output compiles.


Does this PR introduce a breaking change?

No. The change only strips a BOM if present; no impact on existing UTF-8 (no BOM) files or other encodings.


Other information