Some small omissions in prolog_read_source_term/4? (original) (raw)

March 27, 2025, 12:25am 1

I’ve been working on adding automatic code formatting support to my LSP server (thanks to @logicmoo). It currently has support for some basic formatting, but I’ve encountered some issues with prolog_read_source_term/4: While it does give very complete information about the location of terms in the file, terms that are defined as operators are somewhat tricky to deal with. For instance, if a source file contains the below:

:- dynamic foo/1.
:- dynamic(foo/1).

Then the only difference when comparing the terms is that the second one ends one character later in the file. I’m currently resorting to such “hacks” to guess whether the original source file had parenthesis (or, e.g. 'foo' vs foo are both the atom foo, so I have to observe that the boundaries of the former are two greater than just the length of the atom).

First question: Is there any better/more reliable way for me to ascertain if a term was written with parentheses?

Similarly, separators in lists aren’t indicated, so the lines

:- [1,    2].
:- [1   , 2].

aren’t distinguishable. This I’m less concerned about, since part of what the formatter does is normalize comma positions, but I’m kind of curious if there’s any way to handle this if I did care.

jan March 27, 2025, 8:06am 2

Good to hear. Would be great to have a reformatter. Note that there is a lot of useful stuff in GitHub - JanWielemaker/reindent: Re-indent SWI-Prolog code This is a bit more limited in scope and was made to merely update the layout conventions of the SWI-Prolog libraries.

The subterm_positions is great building block that tells you were the various parts of the term come from, but indeed not the location of interpunction or the lexical representation of terms. I.e., 0.1 and 1.0e-1 are the same number and the subterm_position only tell where they are. Also quoted atoms and strings have different options for escaping.

What I normally do is to read the content into a string. Now you can use sub_string/5 to quickly get the lexical representation of literals as well as the space between two consecutive literals.

@jan A follow-up to this: I’m now encountering an issue because, as best I can tell, there’s no way to get the location of the final full-stop from the predicate. This is causing some issues when want to move predicate sources around, as I’m forced to assume that the full-stop comes right after the end of the term, but with “weird” formatting, that assumption is broken and I end up either not having a full stop or an extra one.

e.g., for

weird_format(X) :-
    write(X)
.

prolog_read_source_term/4 reports the end of the term as being at the close paren of write/1, not at the trailing dot on the following line.

Is there anything I can do here other than seeking forward in the file until I find the full-stop?

jan May 28, 2025, 8:09am 4

A bit complicated. One way is to get the stream position right after reading the term. This gives the end of the full stop. Note that that the full stop is defined as <non-symbol-char> . <white-space>. So, you get the position after the white space … but … comment is also white space, so

term  ./*this is comment*/

will return the position after the ‘/’. I don’t know whether that comment is included in the comment list that you get from reading the term. If so, it is not so hard: the dot is just before the first comment that starts after the term or, if that does not exist, one character before the stream position.