Steps to get started creating a new language binding? (original) (raw)

Yes, build-time, i.e. sometime no later than compile-time.

I read that as an orthogonal matter: where you have a language with multiple compilers/interpreters available, there may be a choice between using standard features of the language that should work with all compilers/interpreters (implementation-agnostic) and using features specific to a particular compiler/interpreter (implementation-specific). This isn’t even a simple binary choice. You could use some implementation-agnostic features and some implementation-specific features. For example, in Giraffe Library, which generates platform-independent SML source code, the signatures are implementation-agnostic but the structures are implementation-specific (necessarily so because there is no standard C FFI in SML, each compiler has its own).

A reasonable suggestion: an interpreter will resolve dependencies on the target platform (in the same process in which the code runs) and this information can be efficiently obtained via libgirepository. (Note that if the language allows type annotations, you can’t write alias types e.g. Gtk.Allocationusing libgirepository: you would have to refer to its underlying typeGdk.Rectangle. Type aliases are available only in the GIR files.)

I don’t know of such a tool and I suspect nobody’s attempted to create a general tool because there is so much variation in the possible mechanisms for language bindings. (For example, there may be approaches that don’t generate any code because an interpreter is extended to interact with libgirepository, using libffi to dynamically create calls to functions.)

Yes, a complete understanding of these topics is required.

Looking briefly at Janet’s C API described in the link you provided, it looks like you want to generate a Janet module, one module for each GObject namespace, that is compiled up front. Your Janet functions will be calling the C functions and you’ll need a #include directive at the top for the C functions you depend on, e.g. #include <gtk/gtk.h>. That in itself is enough to require you to generate code from GIR files because the C include information is not available via libgirepository.

Even then, there is a key decision to make. Do you:

  1. distribute pre-generated modules that users can compile on their machines or
  2. expect users to generate these modules on their machines, i.e. run your code generator?

If you do 1, then you will need to guard all functions according to their availability, e.g.

#if GTK_CHECK_VERSION(3, 16, 0)
...
janet_gtk_text_buffer_insert_markup (...)
{
  ...unwrap...
  gtk_text_buffer_insert_markup (...);
  ...wrap...
}
#endif

If you do 2, then you shouldn’t need to guard functions because users should use the GIR files for the version they have installed.

The code generator in Giraffe Library could be useful as a starting point because it already generates C wrapper functions for one of the compilers, which is a similar task. However, I would advise doing several examples by hand first, writing the Janet modules manually, before generating anything. You will need to understand memory management and using Janet’s GC. Even for the basic Hello World example, you will need to put a mechanism in place for calling Janet functions from GObject closures.

For example, libgirepository tells you about the arguments of a C function - their type, direction, nullability, etc. You would use this to determine how to wrap/unwrap the arguments in the Janet wrapper function.

There are different possible architectures but documentation as you describe for common architectures would be very useful. Possibly some sort of decision chart helping you decide on an approach would be useful.

I don’t know about Swig. A quick read suggests that it would work from the C source code. If so, you would have to provide some configuration for Swig for every library that would have to be maintained going forward, which sounds like quite a burden. Also, the introspection metadata is provided in gtk-doc comments and is more abstract than what is available in the C code, e.g. C function arguments are classified as ‘in’/‘inout’/‘out’ parameters but I can’t see how Swig would know about that.