PRE-PROPOSAL: Source and Encoding keyword (original) (raw)

Roel Spilker r.spilker at gmail.com
Sat Mar 7 14:22:28 PST 2009

Previous message: PRE-PROPOSAL: Source and Encoding keyword
Next message: PRE-PROPOSAL: Source and Encoding keyword
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Good one :-) Javac won't even create a class file if the @Override annotation is present but shouldn't be there.

On Sat, Mar 7, 2009 at 7:22 PM, Igor Karp <igor.v.karp at gmail.com> wrote:

Roel,

well, these were not my ideas anyway ;-). I would be equally unhappy using javadoc appoach. And as a side note: @Override does influence the result of the compiler already. Igor On Sat, Mar 7, 2009 at 9:55 AM, Roel Spilker <r.spilker at gmail.com> wrote: > I'd say javadoc, as well as annotation, should never influence the result of > the compiler. That's just not the right vehicle. > > Roel > > On Sat, Mar 7, 2009 at 6:27 PM, Igor Karp <igor.v.karp at gmail.com> wrote: >> >> Reiner, >> >> please see the comments inline. >> >> On Fri, Mar 6, 2009 at 11:39 PM, Reinier Zwitserloot >> <reinier at zwitserloot.com> wrote: >> > Igor, >> > >> > how could the command line options be expanded? Allow -encoding to >> > specify a >> > separate encoding for each file? I don't see how that can work. >> For example: allow multiple -encoding options and add optional path to >> encoding -encoding [,] >> Where path can be either a package (settings applied to the package >> and every package under it) or a single file for maximum precision. >> So one can have: >> -encoding X - encoding Y,a.b -encoding Z,a.b.c -encoding >> X,a.b.c.d.IAmSpecial >> IAMSpecial.java will get encoding X, >> everything else under a.b.c will get encoding Z, >> everything else under a.b will get encoding Y >> and the rest will get encoding X. >> Same approach can be applied to -source. >> >> > There's no >> > way I or anyone else is going to edit a build script (be it just javac, >> > a >> > home-rolled thing, ant, rake, make, maven, ivy, etcetera) to carefully >> > enumerate every file's source compatibility level. >> Sure, thats what argfiles are for: store the settings in a file and >> use javac @argfile. >> >> And doing it as proposed above on a package level would make it more >> manageable. >> Remember in your proposal the only option is to specify it on a file >> level (this is fixable i guess). >> >> > Changing the command line >> > options also incurs the neccessary wrath of all those build tool >> > developers >> > as they'd have to update their software to handle the new option (adding >> > an >> > option is a change too!) >> Not more than changing the language itself. >> >> > >> > Could you also elaborate on why you don't like it? For example, how can >> > the >> > benefits of having (more) portable source files, easier migration, and a >> > much cleaner solution to e.g. the assert-in-javac1.4 be achieved with >> > e.g. >> > command line options, or do you not consider any of those worthwhile? >> I fully support the goal. I even see it as is a bit too narrow (see >> below). But I do not see a need to change the language to achieve that >> goal. >> >> On a conceptual level I see these options as a metadata of the source >> files and I don't like the idea of coupling it with the file. >> One can avoid all this complexity of extra parsing by specifying the >> encoding in an external file. This external file does not have >> itself to be in that encoding. In fact it can be restricted to be >> always in ASCII. >> >> I think the addition of an optional path and allowing multiple use of >> the same option approach is much more scalable: it could be extended >> to the other existing options (like -deprecation, -Xlint, etc.) and to >> the options that might appear in the future. >> >> I wish I could concentrate on deprecations in a certain package and >> ignore them everywhere else for now: >> javac -deprecation,really.rusty.one ... >> Finished with (or gave up on ;) that one and want to switch to the next >> one: >> javac -deprecation,another.old.one >> >> Igor Karp >> >> > >> > As an aside, how do people approach project coin submissions? I tend to >> > look >> > at a proposal's value, which is its benefit divided by the disadvantages >> > (end-programmer complexity to learn, amount of changes needed to javac >> > and/or JVM, and restrictions on potential future expansions). One of the >> > reasons I'm writing this up with Roel is because the disadvantages >> > seemed to >> > be almost nonexistent on the outset (the encoding stuff made it more >> > complicated, but at least the complication is entirely hidden from java >> > developer's eyes, so it value proposal is still aces in my book). If >> > there's >> > a goal to keep the total language changes, no matter how simple they >> > are, >> > down to a small set, then benefit regardless of disadvantages is the >> > better >> > yardstick. >> > >> > --Reinier Zwitserloot >> > >> > >> > >> > On Mar 7, 2009, at 08:15, Igor Karp wrote: >> > >> >> On Fri, Mar 6, 2009 at 10:03 PM, Reinier Zwitserloot >> >> <reinier at zwitserloot.com> wrote: >> >>> >> >>> We have written up a proposal for adding a 'source' and 'encoding' >> >>> keyword (alternatives to the -source and -encoding keywords on the >> >>> command line; they work pretty much just as you expect). The keywords >> >>> are context sensitive and must both appear before anything else other >> >>> than comments to be parsed. In case the benefit isn't obvious: It is a >> >>> great help when you are trying to port a big project to a new source >> >>> language compatibility. Leaving half your sourcebase in v1.6 and the >> >>> other half in v1.7 is pretty much impossible today, it's all-or- >> >>> nothing. It should also be a much nicer solution to the 'assert in >> >>> v1.4' dilemma, which I guess is going to happen to v1.7 as well, given >> >>> that 'module' is most likely going to become a keyword. Finally, it >> >>> makes java files a lot more portable; you no longer run into your >> >>> strings looking weird when you move your Windows-1252 codefile java >> >>> source to a mac, for example. >> >>> >> >>> Before we finish it though, some open questions we'd like some >> >>> feedback on: >> >>> >> >>> A) Technically, starting a file with "source 1.4" is obviously silly; >> >>> javac v1.4 doesn't know about the source keyword and would thus fail >> >>> immediately. However, practically, its still useful. Example: if >> >>> you've mostly converted a GWT project to GWT 1.5 (which uses java 1.5 >> >>> syntax), but have a few files remaining on GWT v1.4 (which uses java >> >>> 1.4 syntax), then tossing a "source 1.4;" in those older files >> >>> eliminates all the generics warnings and serves as a reminder that you >> >>> should still convert those at some point. However, it isn't -actually- >> >>> compatible with a real javac 1.4. We're leaning to making "source >> >>> 1.6;" (and below) legal even when using a javac v1.7 or above, but >> >>> perhaps that's a bridge too far? We could go with magic comments but >> >>> that seems like a very bad solution. >> >>> >> >>> also: >> >>> >> >>> Encoding is rather a hairy issue; javac will need to read the file to >> >>> find the encoding, but to read a file, it needs to know about >> >>> encoding! Fortunately, every single popular encoding on wikipedia's >> >>> popular encoding list at: >> >>> >> >>> >> >>> >> >>> http://en.wikipedia.org/wiki/Characterencoding#Popularcharacterencodings >> >>> >> >>> will encode "encoding own-name-in-that-encoding;" the same as ASCII >> >>> would, except for KOI-7 and UTF-7, (both 7 bit encodings that I doubt >> >>> anyone ever uses to program java). >> >>> >> >>> Therefore, the proposal includes the following strategy to find the >> >>> encoding statement in a java source file without knowing the encoding >> >>> beforehand: >> >>> >> >>> An entirely separate parser (the encoding parser) is run repeatedly >> >>> until the right encoding is found. First it'll decode the input with >> >>> ISO-8859-1. If that doesn't work, UTF-16 (assume BE if no BOM, as per >> >>> the java standard), then as UTF-32 (BE if no BOM), then the current >> >>> behaviour (-encoding parameter's value if any, otherwise platform >> >>> default encoding). This separate parser works as follows: >> >>> >> >>> 1. Ignore any comments and whitespace. >> >>> 3. Ignore the pattern (regexp-like-syntax, ): source\s+[^\s]+\s*; - if >> >>> that pattern matches partially but is not correctly completed, that >> >>> parser run exits without finding an encoding, immediately. >> >>> 4. Find the pattern: encoding\s+([^\s]+)\s*; - if that pattern matches >> >>> partially but is not correctly completed, that parser run exists >> >>> without finding an encoding, immediately. If it does complete, the >> >>> parser also exists immediately and returns the captured value. >> >>> 5. If it finds anything else, stop immediately, returning no encoding >> >>> found. >> >>> >> >>> Once it's found something, the 'real' java parser will run using the >> >>> found encoding (this overrides any -encoding on the command line). >> >>> Note that the encoding parser stops quickly; For example, if it finds >> >>> a stray \0 or e.g. the letter 'i' (perhaps the first letter of an >> >>> import statement), it'll stop immediately. >> >>> >> >>> If an encoding is encountered that was not found during the standard >> >>> decoding strategy (ISO-8859-1, UTF-16, UTF-32), but worked only due to >> >>> a platform default/command line encoding param, (e.g. a platform that >> >>> defaults to UTF-16LE without a byte order mark) a warning explaining >> >>> that the encoding statement isn't doing anything is generated. Of >> >>> course, if the encoding doesn't match itself, you get an error >> >>> (putting "encoding UTF-16;" into a UTF-8 encoded file for example). If >> >>> there is no encoding statement, the 'real' java parser does what it >> >>> does now: Use the -encoding parameter of javac, and if that wasn't >> >>> present, the platform default. >> >>> >> >>> However, there is 1 major and 1 minor problem with this approach: >> >>> >> >>> B) This means javac will need to read every source file many times to >> >>> compile it. >> >>> >> >>> Worst case (no encoding keyword): 5 times. >> >>> Standard case if an encoding keyword: 2 times (3 times if UTF-16). >> >>> >> >>> Fortunately all runs should stop quickly, due to the encoding parser's >> >>> penchant to quit very early. Javacs out there will either stuff the >> >>> entire source file into memory, or if not, disk cache should take care >> >>> of it, but we can't prove beyond a doubt that this repeated parsing >> >>> will have no significant impact on compile time. Is this a >> >>> showstopper? Is the need to include a new (but small) parser into >> >>> javac a showstopper? >> >>> >> >>> C) Certain character sets, such as ISO-2022, can make the encoding >> >>> statement unreadable with the standard strategy if a comment including >> >>> non-ASCII characters precedes the encoding statement. These situations >> >>> are very rare (in fact, I haven't managed to find an example), so is >> >>> it okay to just ignore this issue? If you add the encoding statement >> >>> after a bunch of comments that make it invisible, and then compile it >> >>> with the right -encoding parameter, you WILL get a warning that the >> >>> encoding statement isn't going to help a javac on another platform / >> >>> without that encoding parameter to figure it out, so you just get the >> >>> current status quo: your source file won't compile without an explicit >> >>> -encoding parameter (or if that happens to be the platform default). >> >>> Should this be mentioned in the proposal? Should the compiler (and the >> >>> proposal) put effort into generating a useful warning message, such as >> >>> figuring out if it WOULD parse correctly if the encoding statement is >> >>> at the very top of the source file, vs. suggesting to recode in UTF-8? >> >>> >> >>> and a final dilemma: >> >>> >> >>> D) Should we separate the proposals for source and encoding keywords? >> >>> The source keyword is more useful and a lot simpler overall than the >> >>> encoding keyword, but they do sort of go together. >> >> >> >> Separate. Another reason is: the argument of applying different >> >> settings >> >> to >> >> different parts of the project is much less valid with encoding than >> >> with source. >> >> >> >>> >> >>> --Reinier Zwitserloot and Roel Spilker >> >>> >> >>> >> >> Overall: I would prefer command line options enhanced to handle the >> >> situation >> >> rather than language change. >> >> >> >> Igor Karp >> > >> > >> > >

Previous message: PRE-PROPOSAL: Source and Encoding keyword
Next message: PRE-PROPOSAL: Source and Encoding keyword
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the coin-dev mailing list