ArangoDB's Inspection Framework

Find a file

Markus Pfeiffer cb8ea0bb6f Reversed serialisation into sexpression		2023-01-25 23:50:01 +00:00
include/Inspection	Reversed serialisation into sexpression	2023-01-25 23:50:01 +00:00
test	partial implementation of SExprSaveInspector	2023-01-23 08:55:21 +00:00
.clang-format	Clang-format	2023-01-16 23:26:59 +00:00
.gitignore	Initial	2023-01-16 21:01:27 +00:00
CMakeLists.txt	Clang-format	2023-01-16 23:26:59 +00:00
CMakePresets.json	Initial	2023-01-16 21:01:27 +00:00
README.md	Initial	2023-01-16 21:01:27 +00:00

README.md

Type inspection

At the moment this library is rather tightly coupled with VelocyPack since the primary goal was to simplify (de)serialization from and to VelocyPack. It is planned to reduce this coupling so that the Inspector concept can also be used in other scenarios (e.g., deserializing from agency nodes).

Acknowledgements

This library is heavily inspired by the type inspection library from the C++ Actor Framework (CAF). Kudos to Dominik Charousset & Co for their great work!

Data Model

The current data model is largely based on the types supported by VelocyPack.

built-in types double, boolean and string signed and unsigned integer types for 8, 16, 32 and 64 bit
lists Dynamically-sized container types such as std::vector, std::list, etc.
tuples Fixed-sized container types such as std::tuple, std::array or built-in C array types.
maps Dynamically-sized container types with key/value pairs such as std::map. At the moment we only support strings as keys.
objects User-defined types. An object has one or more fields. Fields have a name and may be optional. Further, fields may take on a fixed number of different types.

There is also support for std::optional, std::unique_ptr, std::shared_ptr, velocypack::SharedSlice and std::variant, as well as "unsafe" types like std::string_view and velocypack::Slice, which we will cover later.

Limitations

At the moment all types used for deserialization must be default constructible.

Inspecting Objects

The inspection API allows to describe C++ objects. Users can either provide free functions named inspect that are picked up via ADL or specialize arangodb::inspection::Access.

In both cases, users call member functions on an Inspector that provides a domain-specific language (DSL) to describe the data (e.g., the structure of a C++ object, the values of an enum, the alternatives of a std::variant etc). Different inspectors can then process theses descriptions to do different things. At the moment there are only two inspectors to allow (de)serializing from/to velocypack, but the concept is very generic and allows us to add more inspectors for other cases.

Writing `inspect` Overloads

Adding overloads for inspect generally provides the simplest way since it requires the least amount of boilerplate code.

For example consider this simple struct, with an inspect function where we simply pass all member variables as fields to the inspector:

struct VertexDescription {
  std::string_view id;
  uint64_t depth;
  double weight;
};

template<class Inspector>
auto inspect(Inspector& f, VertexDescription& description) {
  return f.object(description)
      .fields(f.field("vId", description.id),
              f.field("depth", description.depth),
              f.field("weight", description.weight));
}

Objects are simply containers for fields which in turn contain values. By providing an inspect overload, we recursively traverses all fields and "inspect" them as well.

Not every type needs to expose itself as object, though. A good example are our internal identifiers (everything that derives from basics::Identifier):

class Identifier {
  ...
  std::uint64_t id;
};

The type Identifier is basically a strong typedef to improve type safety when writing code. But if we describe it as an object with a field, then to an inspector it looks as follows:

  object(type: "Identifier") {
    field(name: "id") {
      value(type: "uint64_t") {
        ...
      }
    }
  }

Now, this type has little use on its own and only introduces unnecessary overhead. We do not need the additional object wrapper and could make it completely transparent to inspectors. This can be achieved by writing the inspect overload for Identifier as follows:

  template <class Inspector>
  auto inspect(Inspector& f, Identifier& x) {
    return f.apply(x.id);
  }

Types with Getter and Setter Access

This is currently not implemented. Please let me know in case you need support for this.

Fallbacks, Invariants and custom Context

For each field, we may provide a fallback value or fallback factory for optional fields, as well a predicate that checks invariants on the data. For example, consider the following class and its implementation for inspect:

struct LogTargetConfig {
  std::size_t writeConcern = 1;
  std::size_t softWriteConcern = 1;
  bool waitForSync = false;
};

bool greaterZero(std::size_t v) { return v > 0; }

template<class Inspector>
auto inspect(Inspector& f, LogTargetConfig& x) {
  return f.object(x)
      .fields(f.field("writeConcern", x.writeConcern).invariant(greaterZero),
              f.field("softWriteConcern", x.softWriteConcern)
                  .fallback(std::ref(x.writeConcern)).invariant(greaterZero),
              f.field("waitForSync", x.waitForSync).fallback(f.keep()))
      .invariant([](LogTargetConfig& c) { return c.writeConcern >= c.softWriteConcern});
}

By default all attributes specified in the description have to be present when loading. If an attribute is not present but has a fallback value (or factory) defined, then the field is instead initialized with the fallback. Fallback values are taken by value, but as the example shows we can use std::ref to capture references. The attributes are processed in the same order they are specified. writeConcern is a mandatory attribute while softWriteConcern is optional, but if softWriteConcern is not specified explicitly, we want it to default to the exact same value as writeConcern. Since writeConcern is specified first, it is also processed first. For softWriteConcern we capture a reference to writeConcern as fallback, so when we process softWriteConcern and cannot find it, we take the fallback value from the already processed writeConcern. Alternatively one can use Inspector::keep() to indicate that we simply want to keep the current value of that field (see the fallback call for "waitForSync" in the example).

writeConcern and softWriteConcern must both be greater zero, which is verified by the provided invariant function. invariant takes some callable that receives an argument of the same type as the field and must either return a bool or arangodb::inspection::Status to indicated whether it was successful or not. The advantage of using the Status type is that one can provide a meaningful error message with more context.

In addition, softWriteConcern must not be greater than writeConcern. Invariants like this can only be verified once all fields have been processed, which is why you can append another invariant call to the result of fields. This invariant function will then receive a reference to the fully initialized object.

Invariants are only checked when isLoading is true (see "Splitting Save and Load").

If all you want to do is check the invariants for a manually filled struct there is the ValidateInspector that does just that. Since invariant are only checked when isLoading is true, this is also set for the ValidateInspector, even though it does not modify the given object.

Being able to define fallback values and invariants is fine, but sometimes those depend on some configuration that are not available in the inspect function. This is were custom contexts come in. Suppose we want writeConcern to default to some value that can be configured, e.g., via the command line. Then you can pass an object that contains your defaultWriteConcern to the constructor of the respective Inspector, which will store a reference to it. This reference is then available via getContext as shown in the following example which uses a fallbackFactor to initialize writeConcern (this is just so we also have an example of how to use fallbackFactory; fallback would work just as well):

template<class Inspector>
auto inspect(Inspector& f, LogTargetConfig& x) {
  auto& context = f.getContext();
  return f.object(x)
      .fields(f.field("writeConcern", x.writeConcern).invariant(greaterZero)
                  .fallbackFactory([&]() { return context.defaultWriteConcern; }),
              f.field("softWriteConcern", x.softWriteConcern)
                  .fallback(std::ref(x.writeConcern)).invariant(greaterZero),
              f.field("waitForSync", x.waitForSync).fallback(f.keep()))
      .invariant([](LogTargetConfig& c) { return c.writeConcern >= c.softWriteConcern});
}

Embedded fields

In some cases we may want to "reuse" the inspect function of some type, e.g., in case of inheritance. This can be achieved using "field embedding":

struct Inner {
  string s;
};
template<class Inspector>
auto inspect(Inspector& f, Inner& v) {
  return f.object(v).fields(f.field("s", v.s));
}

struct Base {
  int x;
};
template<class Inspector>
auto inspect(Inspector& f, Base& v) {
  return f.object(v).fields(f.field("x", v.x));
}

struct Derived : Base {
  int y;
  Inner i;
};
template<class Inspector>
auto inspect(Inspector& f, Derived& v) {
  return f.object(v).fields(
    f.embedFields(static_cast<Base&>(*this)),
    f.field("y", v.y),
    f.embedFields(v.i));
}

Here the inspect function for Derived embeds the fields for Base and Inner, so we end up with an object description that looks like this:

  object(type: "Derived") {
    field(name: "x")
    field(name: "y")
    field(name: "s")
  }

A type that is used in embedFields must be inspected as an object, i.e., its inspect function must use object(..).fields(...). If this requirement is not met the compiler should fail with a static_assert that points that out.

Specializing `Access`

Instead of writing inspect functions one can specialize arangodb::inspection::Access. This not only allows one to work with with 3rd party libraries (for which adding free functions is usually ruled out), but also allows to customize every step of the inspection process. It requires writing more boilerplate code though.

The full interface of Access looks as follows:

template <class T>
struct Access {
  template <class Inspector>
  static bool apply(Inspector& f, T& x);

  template<class Inspector>
  static auto saveField(Inspector& f, std::string_view name,
                        bool hasFallback, Value& val);

  template<class Inspector, class Transformer>
  static auto saveTransformedField(Inspector& f,
                                   std::string_view name,
                                   bool hasFallback, Value& val,
                                   Transformer& transformer);

  template<class Inspector>
  static Status loadField(Inspector& f, std::string_view name,
                          bool isPresent, Value& val);

  template<class Inspector, class ApplyFallback>
  static Status loadField(Inspector& f, std::string_view name,
                          bool isPresent, Value& val,
                          ApplyFallback&& applyFallback);

  template<class Inspector, class Transformer>
  static auto loadTransformedField(Inspector& f,
                                   std::string_view name,
                                   bool isPresent, Value& val,
                                   Transformer& transformer);

  template<class Inspector, class ApplyFallback, class Transformer>
  static Status loadTransformedField(
      Inspector& f, std::string_view name, bool isPresent, Value& val,
      ApplyFallback&& applyFallback, Transformer& transformer);
};

Optionals

We previously saw how we can use fallback values for attributes that are not mandatory. As an alternative to fallback values one can use optional values. std::optional, std::unique_ptr and std::shared_ptr all qualify as optional values. That is, if no matching attribute is found, then the field is set to std::monostate/nullptr. Otherwise, a default constructed instance of the wrapped type is created and inspected recursively.

Variants

Even though support for variants is built-in, the inspection library needs some additional information about the variant type. There is an API to describe variants - very similar to what we previously saw for objects. There are different ways how a variant value can be encoded. There are inline types and non-inline types. The latter have a dedicated type indicator field while the first ones do not. Non-inline types can come in three different forms - "qualified", "unqualified" and "embedded".

Consider the following example:

using MyVariant = std::variant<std::string, int, Struct1> {};

template<class Inspector>
auto inspect(Inspector& f, MyVariant& x) {
  namespace insp = arangodb::inspection;
  return f.variant(x).qualified("type", "value").alternatives(
      insp::inlineType<std::string>(),
      insp::type<int>("int"),
      insp::type<Struct1>("Struct1"));
}

This serializes/deserializes the variant in "qualified form", where string is defined as an inline type, while int and Struct1 are non-inline types.

As already mentioned, inline types do not have a type indicator, but instead the values are just directly stored as-is. So writing inline types is very straightforward, but for parsing we somehow have to determine the type based on the data. Basically how this is done is that we simply try to parse each of the inline types (in the order in which they are specified). If the parse was successful, then that is the type and value that we use, otherwise we continue with the next type. So if you have two types A and B where A is a supertype of B (i.e., every possible value for B would also be a possible value for A - for example double/int), then B must be listed before A.

Note: any errors (including failed invariants) that occur while trying to parse inline types are ignored and the type is dismissed!

Inline types are primarily useful for scalar types like string, int or bool, but you can actually use it for arbitrary types. We try to be smart and rule out some cases that we know will fail to parse based on the velocypack type, e.g., if our target type is string and the velocypack type is something else. But if none of these checks fail, we simply have to try to parse into an instance of the current target type, so we have to create a default constructed instance.

Inline types are always checked first (and therefore also must be listed before the non-inline types, otherwise you will get a compiler error pointing this out). Only if none of those could be parsed we move on to the non-inline types.

In "qualified" form the variant is serialized as an object with two attributes as specified in the qualified call. For example:

{
  "type": "int",
  "value: 42
}

In "unqualified" form the variant is also serialized as an object, but only uses a single attribute with the type name. Suppose we would write the inspect function as follows:

template<class Inspector>
auto inspect(Inspector& f, MyVariant& x) {
  namespace insp = arangodb::inspection;
  return f.variant(x).unqualified().alternatives(
      insp::type<std::string>("string"),
      insp::type<int>("int"),
      insp::type<Struct1>("Struct1"));
}

Then the generated result would instead look like this:

{
  "string": "foobar"
}

The "embedded" form can only be used if all types (except for the inline types) in the variant are inspected as objects. In this form the type indicator field is then serialized on the same level as the object fields:

struct Struct1 { int a; }:
struct Struct2 { int b; }:
using MyEmbeddedVariant = std::variant<Struct1, Struct2> {};

template<class Inspector>
auto inspect(Inspector& f, MyEmbeddedVariant& x) {
  namespace insp = arangodb::inspection;
  return f.variant(x).embedded("type").alternatives(
      insp::type<Struct1>("Struct1"),
      insp::type<Struct2>("Struct2"));
}

If we were to serialize a Struct2{.a = 42} this would generate the following result:

{
  "type": "Struct1",
  "a": 42
}

Enums

There is a separate inspect API to define value mappings for enum types.


enum class MyStringEnum {
  kValue1,
  kValue2,
  kValue3 = kValue2,
};

template<class Inspector>
auto inspect(Inspector& f, MyStringEnum& x) {
  return f.enumeration(x).values(MyStringEnum::kValue1, "value1",  //
                                 MyStringEnum::kValue2, "value2");
}

The call to values takes an arbitrary number of arguments, but is consumed pairwise where the first value is the enum value, and the second one that value that it is mapped to.

Enum values can be mapped to strings or integers. It is also possible to map one enum value to multiple different strings and/or ints (i.e., you can mix the target types). For example:

enum class MyMixedEnum {
  kValue1,
  kValue2,
};

template<class Inspector>
auto inspect(Inspector& f, MyMixedEnum& x) {
  return f.enumeration(x).values(MyMixedEnum::kValue1, "value1",  //
                                 MyMixedEnum::kValue1, 1,         //
                                 MyMixedEnum::kValue2, "value2",  //
                                 MyMixedEnum::kValue2, 2);
}

In this case both enum values are mapped to both, a string and an integer value. This means we can load both, string and int values and map them do our enum. When saving, the first mapping will be used, so in this example both values would be saved as a string.

Transformers

In some cases the we want to serialize the same type differently, e.g., in one case we might want to serialize a std::chrono::time_point as ISO 8601 string, while in another case we want a Unix timestamp. This can be achieved by using transformers. A transformer is simply a type that provides an alias SerializedType and two member functions toSerialized and fromSerialized.

For example see this dummy Transformer which converts between int and string:

struct DummyTransformer {
  using SerializedType = std::string;

  arangodb::inspection::Status toSerialized(int v,
                                            SerializedType& result) const {
    result = std::to_string(v);
    return {};
  }
  arangodb::inspection::Status fromSerialized(SerializedType const& v,
                                              int& result) const {
    result = std::stoi(v);
    return {};
  }
};

The transformer is then applied to a field as follows:

struct Foo {
  int x;
};

template<class Inspector>
auto inspect(Inspector& f, Foo& x) {
  return f.object(x).fields(f.field("x", x.x).transformWith(DummyTransformer{}));
}

So even though the field is an int, during serialization it will be transformed into a string. During deserialization we convert back from string to int.

Splitting Save and Load

Usually, load and save operations are symmetric, allowing us to provide a single description of the data and leave the rest to the different inspectors. But in some cases writing custom inspect functions with a single overload for all inspectors may result in undesired tradeoffs or convoluted code. In these cases it can be beneficial to split the code into separate save and load parts. For this reason, all inspectors provide a static constant called isLoading. This allows to use if constexpr, for example to delegate to custom functions:

template <class Inspector>
auto inspect(Inspector& f, my_class& x) {
  if constexpr (Inspector::isLoading) {
    return load(f, x);
  } else {
    return save(f, x);
  }
}

Serializing / Deserializing

Writing the inspect functions is the prerequisite, but in order to actually perform (de)serialization we have to create the according inspectors. We currently have VPackSaveInspector, VPackLoadInspector and VPackUnsafeLoadInspector. You have to use the last one in order to deserialize unsafe types like std::string_view or velocypack::Slice. Those are considered unsafe since they only store a pointer into the velocypack buffer getting parsed, so if the parsed data outlives that buffer, we end up with dangling pointers. Therefore, if you want to deserialize unsafe types, you have to use the correct inspector and take care that all pointers remain valid.

To make things even easier, Inspection/VPack.h provides some free-standing functions for serializing/deserializing to/from velocypack:

namespace arangodb::velocypack {
  template<class T>
  void serialize(Builder& builder, T& value);

  template<class T>
  void deserialize(Slice slice, T& result,
                  inspection::ParseOptions options = {});

  template<class T>
  void deserializeUnsafe(Slice slice, T& result,
                        inspection::ParseOptions options = {});

  template<class T>
  T deserialize(Slice slice, inspection::ParseOptions options = {});

  template<class T>
  T deserializeUnsafe(Slice slice, inspection::ParseOptions options = {});
}  // namespace arangodb::velocypack

These functions will throw in case anything goes wrong. It is recommended to use these functions instead of the according inspector types.

By default deserialization is very strict - the given velocypack must contain exactly those attributes specified in the object description. If an attribute of non-optional type is missing and has no fallback, then this results in a "missing required attribute "error. Likewise, any additional attributes that exist in the velocypack but not in the object results in a "unexpected attribute" error. The ParseOptions provide a way to relax those checks and ignore unknown and/or missing attributes. In case missing attributes are ignored, the corresponding fields simply remain untouched and keep their original value.