Custom serialization :: Red Planet Labs Documentation (original) (raw)
Serialization for custom types is implemented using the RamaCustomSerialization interface. An implementation of this interface targets a particular type and provides methods to serialize and deserialize instances of that type. It is independent from the types themselves.
public abstract class ThriftSerialization implements RamaCustomSerialization<TBase> {
private final Map<Short, Class> _idToType = new HashMap<>();
private final Map<Class, Short> _typeToId = new HashMap<>();
protected ThriftSerialization() {
Map<Integer, Class> m = typeIds();
for(Integer id: m.keySet()) _idToType.put(id.shortValue(), m.get(id));
for(Short id: _idToType.keySet()) _typeToId.put(_idToType.get(id), id);
}
@Override
public void serialize(TBase obj, DataOutput out) throws Exception {
Short id = _typeToId.get(obj.getClass());
if(id==null) throw new RuntimeException("Could not find type id for " + obj.getClass());
out.writeShort(id);
byte[] serialized = new TSerializer(new TCompactProtocol.Factory()).serialize(obj);
out.writeInt(serialized.length);
out.write(serialized);
}
@Override
public TBase deserialize(DataInput in) throws Exception {
TBase obj = (TBase) _idToType.get(in.readShort()).newInstance();
byte[] arr = new byte[in.readInt()];
in.readFully(arr);
new TDeserializer(new TCompactProtocol.Factory()).deserialize(obj, arr);
return obj;
}
@Override
public Class targetType() {
return TBase.class;
}
protected abstract Map<Integer, Class> typeIds();
}
There are three methods on the RamaCustomSerialization
interface. targetType
returns the type this object serializes. In this case it returns TBase
, the base class for all Thrift objects. serialize
writes an object of the target type to a DataOutput
, and deserialize
reads an object of the target type from DataInput
.
ThriftSerialization
implements serialization for all Thrift objects. In order to deserialize an object using Thrift, you need to first create an empty instance of the specific type being deserialized. So this code needs to distinguish different types in the serialized output. It does this using type IDs. An application using ThriftSerialization
would extend the class with an implementation of typeIds
to provide that information. Here’s an example of what that looks like:
public class MyApplicationThriftSerialization extends ThriftSerialization {
@Override
protected Map<Integer, Class> typeIds() {
Map<Integer, Class> ret = new HashMap<>();
ret.put(0, ProfileEdit.class);
ret.put(1, PageView.class);
ret.put(2, Post.class);
return ret;
}
}
In this case ProfileEdit
, PageView
, and Post
refer to the specific Thrift types used by the application.
Needing to distinguish types on the wire is a common problem when implementing serialization. Explicit type IDs are one solution you could use, but there are other ways you could approach it. You could generate type IDs using hashes of the classnames, with the size of the hash being a tradeoff between serialization compactness and collision resistance. If you used a two-byte hash, you’re only adding two bytes per serialized object, but the chance of multiple types hashing to the same value starts becoming high after only 50 types (see Birthday problem). If you used an eight-byte hash, the chance of collisions is effectively zero but serialization is correspondingly more expensive.
Custom serialization implementations are registered with Rama through the config provided for module launch or when creating a client-side RamaClusterManager. The key "custom.serializations"
is given a list of class names of RamaCustomSerialization
implementations you wish to have active. The classes provided must have a zero-argument constructor so that Rama can create instances of them. For example, to deploy a module using MyApplicationThriftSerialization
, you could first specify the config in a file overrides.yaml
like so:
custom.serializations:
- "com.rpl.myapp.serialization.MyApplicationThriftSerialization"
Then you would deploy the module with a command like the following:
rama deploy \
--action launch \
--jar my-application-jar.jar \
--module com.rpl.myapp.MyModule \
--configOverrides overrides.yaml \
--tasks 128 \
--threads 32 \
--workers 8 \
--replicationFactor 3
To make a client that makes use of MyApplicationThriftSerialization
, you would create the RamaClusterManager
like so:
Map config = new HashMap();
List serializations = new ArrayList();
serializations.add("com.rpl.myapp.serialization.MyApplicationThriftSerialization");
config.put("custom.serializations", serializations);
RamaClusterManager manager = RamaClusterManager.open(config);
It’s critical that all modules and clients that interact with those objects have this serialization registered. Since "custom.serializations"
is a list, you can have as many custom serialization implementations registered as you need. See Operating Rama clusters for more details on deploying modules and using RamaClusterManager
.
Configuring custom serializations in a test context with InProcessCluster
is slightly different. In this context custom serializations are configured as part of the InProcessCluster
instead of as part of individual modules or clients. Here’s an example of using MyApplicationThriftSerialization
with InProcessCluster
:
List<Class> serializations = new ArrayList<>();
serializations.add(MyApplicationThriftSerialization.class);
try(InProcessCluster ipc = InProcessCluster.create(serializations)) {
RamaModule module = new MyModule();
ipc.launchModule(module, new LaunchConfig(4, 4));
}
Finally, here are a few more properties a custom serialization implementation must have:
- Since every worker and client will have a separate instance, the serialization implementation must be deterministic.
- Must be thread-safe since the same instance can be used across multiple threads.
- Multiple
RamaCustomSerialization
implementations should not match on the same types, including parent types. Which one is used in that case is undefined.