Reintroduce header patterns for filetype detection by dmaluka · Pull Request #3208 · micro-editor/micro (original) (raw)

added 11 commits

March 24, 2024 04:47

@dmaluka

The original meaning of foundDef was: "we already found the final syntax definition in a user's custom syntax file". After introducing signatures its meaning became: "we found some potential syntax definition in a user's custom syntax file, but we don't know yet if it's the final one". This makes the code confusing and actually buggy.

At least one bug is that if we found some potential filename matches in the user's custom syntax files, we don't search for more matches in the built-in syntax files. Which is wrong: we should keep searching for as many potential matches as possible, in both user's and built-in syntax files, to select the best one among them.

Fix that by restoring the original meaning of foundDef and updating the logic accordingly.

@dmaluka

No need to parse a syntax YAML file if we are not going to use it, it's a waste of CPU cycles.

@dmaluka

As a preparation for reintroducing header matches.

@dmaluka

Replacing header patterns with signature patterns was a mistake, since both have their own uses. So restore support for header regex, while keeping support for signature regex as well.

@dmaluka

Replacing header patterns with signature patterns was a mistake, since both are quite different from each other, and both have their uses. In fact, this caused a serious regression: for such files as shell scripts without *.sh extension but with #!/bin/sh inside, filetype detection does not work at all anymore.

Since both header and signature patterns are useful, reintroduce support for header patterns while keeping support for signature patterns as well and make both work nicely together.

Also, unlike in the old implementation (before signatures were introduced), ensure that filename matches take precedence over header matches, i.e. if there is at least one filename match found, all header matches are ignored. This makes the behavior more deterministic and prevents previously observed issues like micro-editor#2894 and micro-editor#3054: wrongly detected filetypes caused by some overly general header patterns.

Precisely, the new behavior is:

  1. if there is at least one filename match, use filename matches only
  2. if there are no filename matches, use header matches
  3. in both cases, try to use signatures to find the best match among multiple filename or header matches

@dmaluka

@dmaluka

@dmaluka

Turning header patterns into signature patterns in all syntax files was a mistake. The two are different things. In almost all syntax files those patterns are things like shebangs or or i.e. things that:

  1. can be (and should be) used for detecting the filetype when there is no filename match (and that is actually the purpose of those patterns, so it's a regression that it doesn't work anymore).

  2. should only occur in the first line of the file, not in the first 100 lines or so.

In other words, the old header semantics was exactly what was needed for those filetypes, while the new signature semantics makes little sense for them.

So replace signature back with header in most syntax files. Keep signature only in C++ and Objective-C syntax files, for which it was actually introduced.

@dmaluka

@dmaluka

To make it more clear. Why Buffer?

@dmaluka

Purely cosmetic change: make the code a bit more readable by reducing its visual "density".

JoeKar

@dmaluka

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})