Options for having model, parsers and serializers for a given data-format generated in multiple languages?

I am member of the Apache PLC4X (incubating) project. Here we are currently implementing multiple industry PLC protocols. While we initially focussed on creating Java versions of these, we are currently starting to work on also providing C++ and other languages. Instead of manually syncing and maintaining these, we would rather define the message structures of these protocols in a generic way and have the model, parsers and serializers generated from these definitions. I have looked at several options: 1) Protobuf 2) Thrift 3) DFDL The problems with these are the following: 1) Protobuf seems to be ideal do design a model and have model, serializers and parsers generated from that. With Protobuf it is easy to define a model and ensure I can serialize an object and deserialize it with any language. However I don't have full control over the transport format. For example if I was to encode the constant byte value of 0xFF, this would be a problem. 2) Thrift seems to be more focussed on the services and the models used by these services. The same limitations seem to apply as for Protobuf: I have no full control over the transport format 3) DFDL seems to be exactly what I'm looking for as I want a language to describe my data-format ... unfortunately I could find projects like Daffodil, which seem to be able to use DFDL definitions to parse any data format into some XML like Dom structure. For performance and memory reasons we would rather not do that. Other than that I couldn't find any usable tooling. Also had a look at Avro and Kaitai Struct but Avro seems to have the same issues for my usecase as Protobuf and the guys from Kaitai told me serialization was still experimental My ideal workflow would be (Using Maven): 1) For every protocol I define the DFDL documents describing the different types of messages for a given protocol 2) I define multiple protocol implementation modules (one for each language) 3) I use a maven plugin in each of these to generate the code for that particular language from those central DFDL definitions

Feb 9, 2025 - 07:11
 0
Options for having model, parsers and serializers for a given data-format generated in multiple languages?

I am member of the Apache PLC4X (incubating) project. Here we are currently implementing multiple industry PLC protocols. While we initially focussed on creating Java versions of these, we are currently starting to work on also providing C++ and other languages.

Instead of manually syncing and maintaining these, we would rather define the message structures of these protocols in a generic way and have the model, parsers and serializers generated from these definitions.

I have looked at several options: 1) Protobuf 2) Thrift 3) DFDL

The problems with these are the following:

1) Protobuf seems to be ideal do design a model and have model, serializers and parsers generated from that. With Protobuf it is easy to define a model and ensure I can serialize an object and deserialize it with any language. However I don't have full control over the transport format. For example if I was to encode the constant byte value of 0xFF, this would be a problem.

2) Thrift seems to be more focussed on the services and the models used by these services. The same limitations seem to apply as for Protobuf: I have no full control over the transport format

3) DFDL seems to be exactly what I'm looking for as I want a language to describe my data-format ... unfortunately I could find projects like Daffodil, which seem to be able to use DFDL definitions to parse any data format into some XML like Dom structure. For performance and memory reasons we would rather not do that. Other than that I couldn't find any usable tooling.

Also had a look at Avro and Kaitai Struct but Avro seems to have the same issues for my usecase as Protobuf and the guys from Kaitai told me serialization was still experimental

My ideal workflow would be (Using Maven):

1) For every protocol I define the DFDL documents describing the different types of messages for a given protocol

2) I define multiple protocol implementation modules (one for each language)

3) I use a maven plugin in each of these to generate the code for that particular language from those central DFDL definitions