Reflections: Object-Relational Mapper

This article is based on a discussion with Raphael Parree. Many thanks to him.

What is an ORM?

An ORM is a framework that helps to map objects (as in object-oriented programming) into persistent relational data structures (tables in a relational database).

https://en.wikipedia.org/wiki/Object-relational_mapping

The Hibernate framework is an ORM framework.

Is there a use-case for ORMs?

In object oriented programming, encapsulation is key. It hides the internals of objects, which only expose their interface. Domain objects (these objects which represent things of the real world) are accessed using their interface, whose operations may change the internal state of the object. In pure object oriented design, such internals are an implementation detail hidden from the object's client. The client just does not know how the object has changed internally, it just has the guarantee (provided there is no bug), that the requested change has been applied: the requested state transition of the object has been executed and the object is now in the requested state.
If this object needs to be persisted, an very widely adopted practice is to persist the object's state into a relational database. This permits to ensure that the object is not lost (durability) as well as to remove the object from working volatile memory and thus reuse the freed memory for other purposes.
At a later stage, some process may need to access the object again. As the state of the object has been persisted, we need to rebuild the object using this persisted state data.

We derive from the previous paragraphs the use-cases related to data persistence:

One needs to persistently store an object's state, possibly into a relational database.

One needs to retrieve this persisted state, possibly from a relational database, in order to re-build the object, with the same state.

NOTE: these are not use cases for ORMs.

About the state of an object

An object has behaviour and state. The behaviour of an object is defined by operations that perform state transitions of the object.
What defines the state of an object? Its attributes. The object features a set of attributes whose combined values define its state. Their values also define which state transitions are allowed or not.
An object also has properties: these properties are derived from the attributes of the object, and the object's interface may have several operations that give the ability to its client to ask for the values of its properties. Nevertheless, due to encapsulation, the object's properties (exposed, public, external) may be different from its attributes (hidden, private/protected, internal).
As the state of the object is defined by its attributes, it is sufficient to persist the values of these attributes to be able to rebuild the object later: read these attributes from the persistent store and set their values on a new instance of the same class as the original object.

A use-case for ORMs

In this whole story, the client of the object has no idea what the values of its attributes are. The client is thus not able to know what to store. This means that only components which know about the internals of the object are able to determine what attributes have to be persistently stored.
This is what an ORM does. This is what Hibernate/OpenJPA/EclipseLink/TopLink, and eventually any JPA implementation do. An ORM knows about the internal attributes of the objects that it persists into a relational data store. The client asks the ORM to store or retrieve the object, and the ORM does it. The client does not need to know the internals of the object to execute operations provided by the ORM. In many cases, the ORM will even automatically synchronise the in-memory's object's state with the persistent data store, latest just before the end of the transaction. This makes sense, as the client actually does not really know if and how the object's internal state has changed, and thus has no way to know whether the persisted object's state must be updated or not.
ORMs permit us to leverage object orientation, with its powerful encapsulation, when we design a domain model. This is the actual use-case for an ORM.

Do we need ORMs?

The anemic domain model

In the Java world (at least), most systems use what Martin Fowler calls an anemic domain model: domain models consist of classes that are composed of attributes which are exposed as properties, using the well-known getter and setter methods. These objects also usually lack operations that represent its actual state transition and which encapsulate (and hide) how the state transition occurs: to change the state of an object, instead of requesting a state transition using one of the operations exposed by the object's interface, the client usually just sets the value of the object's attributes.
In this world (our world, the real world), such anemic domain objects have no more features that a structure in C programming language. In this case, one could wonder why setters and getters are needed to expose the object's attributes (which are usually private) instead of making these attributes public. In fact, with recent versions of Hibernate, there is no justification for this. First versions of Hibernate were not able to directly set/get the values of attributes with reflexion (and needed getter/setter methods), but this is no more the case.

Service operations as transaction scripts

As we are relying on the anemic domain model, service operations (the ones that define the boundaries of a transaction) usually consist in retrieving objects and changing their attributes by executing a series of setters of these objects.
This actually means that the service operation does de-facto know what will be persisted, and just specifies this in the object. Latest at the end of the service operation, just before the transaction is committed, the ORM framework will execute SQL INSERT/UPDATE/DELETE statements in order to persist the new values specified by the service operations using the setter methods. (This is actually why Hibernate is a framework: you don't always need to call it, it will automatically synchronise the in-memory object state with the persisted data).
Actually, here, the service operation doesn't need to rely on an ORM to persist the objects, as there is no encapsulation: there is no hidden/encapsulated attribute that the service ignores.
A completely equivalent way to implement the service is to let the service execute itself the series of SQL statements to update the object's persisted state. This situation is fundamentally different from using an ORM, but with an equivalent outcome regarding the persisted state.

The fundamental differences between ORMs and other data access frameworks or libraries

ORMs are complex frameworks, which do many complex things under the hood in order to persist objet state and retrieve it. It behaves like it is magic. Magic is difficult to control and to learn. ORMs are difficult to control and to learn. Do you need this in your software?
Remember your use-cases for persistence: store an object's state and retrieve it.
OK, ORMs can do this, but they are quite complex. Another solution is to have your service operations execute SQL statements (using a Repository/DAO class for this is OK, but not really requested - this discussion is out of scope for this article). Doing this, you have full control of what happens in your software. You don't need to spend years using the ORM to finally understand how it works. There is no more magic. There is just the complexity required by your code's business.
Other impacts of using an ORM:

If you specify that an attribute is lazily loaded, then it will always be lazy-loaded, whatever your use-case. But actually, lazy-loading would make sense in some use-cases, while in others, it would not. With an ORM, you have to make a choice (lazy or eager loading) and then you must stick to it for all your use-cases.
ORMs work on an object graph: they cascade some operations (you may specify this) and they load a graph of objects, starting from the root object and then traversing the graph using SQL JOIN statements. If you just need the root object, you'll get the whole graph, for free. Does this always make sense? Well, it depends on the specific use case. This also brings its set of problems (n+1 selects, for example).
The JPA specification does not define the behaviour of the ORM if two classes are mapped on the same table: this implies that you should have one class per table, if you want guaranteed behaviour.
The exact generated SQL code depends on the implementation of the ORM, and there is no guarantee that the generated code does not change between versions of the same ORM: you change the version of your ORM, and suddently, your application could become very fast, or very slow, and this is out of your control.

Conclusion

As Martin Fowler documented it, we usually do not apply object orientation, and rather use an anemic domain model. Given this, we also usually implement service operations as transaction scripts. In this case, is there still a reason to use ORMs? I am not sure of it.
There are many ways to access relational data: directly using JDBC (I wouldn't recommend this in most cases), Spring's JDBC template, MyBatis... you name it.
Usually, these tools all permit to work with different projections of the same entity, which de-facto means that you can put the data of one table into different classes (one per projection), while this is not possible with JPA implementations (at least).
These tools do work with objects, but just as a means to avoid having to work with JDBC's result set and as a means to specify SQL statement parameters using object attributes. No magic behaviour is involved with these tools - at least nothing that impacts the client of these frameworks.
You will be in full control of the executed SQL code, and when the code is executed. You will be in full control of how the data is updated (no useless attributes updated, no object graph traversal to find modified objects/attributes - or however this is implemented) or retrieved (no useless attributes, no useless SQL JOINs...), and it will correspond to your use-case's requirements (and not to a choice you made because of another use-case).
Provided the whole world uses anemic domain models, is there still a good reason to use ORMs? Systems using domain models which leverage encapsulation justify the use of an ORM.
For other use cases, what is the added value of using ORMs?

Reflections

Featured post

Object-Relational Mapper - Do we need this?

2017-10-14