How OneWebSQL Differs from Hibernate.

There's no question that Hibernate absolutely dominates the O/R mapping market for Java. However, it's practically impossible to find a user that would declare love — or even excitement — for working with it. On the contrary: everybody seems to have been bitten by it more than once and has war stories to share.

Let's see how OneWebSQL compares to Hibernate, how and why it's different. It's high time to confront the 800 pound gorilla in the room.

I'm too lazy to read, show me a picture!

All right, for all of you that want a simple, head to head comparison, see the table below. In the following sections, I'll elaborate on the differences and give the rationale behind every design decision that makes OneWebSQL so unlike Hibernate.

Feature Hibernate OneWebSQL
ORM Philosophy business domain modeling relational data processing
Schema Definition implicit, scattered (HBM files, Java annotations) explicit, centralized (graphical model, ERD diagrams)
Code Generation partial via third party tools central part of the product
Query Language HQL in plain Java strings object-oriented, type-safe SQL DSL
Inheritance supported N/A, explicitly modeled and coded using views and association tables
Lazy Loading required to curb excessive data loading all operations eager, operations map 1:1 with SQL statements, fetch only what you need
Collections hard to use, lazy loading issues, performance and serialization problems plain old Java in-memory collections, no surprises
Transactions unnecessary session abstraction used to resolve entity identities deferred to JTA/JDBC transactions, no redundant APIs
Locking optimistic locking support deferred to database locking via JTA/JDBC transactions
BLOB/CLOB Handling driver-supplied java.sql.Blob/Clob or materialized variants (String, byte[]) dedicated methods that operate on java.io streams/readers/writers; fully interoperable with materialized variants (String, byte[])
Caching built-in integration with third-party providers reference table dictionaries or caching coded explicitly by the programmer
Performance unobvious, hard to tune, n+1 select problems, lazy loading, pervasive reflection close to JDBC, operations map 1:1 with SQL statements, direct ResultSet processing
Schema Maintenance manual code updates (HQL, annotations strings) regenerate DAOs and constants, every incompatibility is a compilation error

 

ORM Philosophy

The most important thing that separates OneWebSQL from Hibernate is how we approach the problem of interactions between applications and databases. For OneWebSQL, the database is a first-class citizen and an integral part of the application, not something you should abstract out or separate from the application. Even if it would be possible, which is not.

Hibernate takes an application-centric approach to the database design. This makes the database schema a second-class citizen to the application's domain model, and introduces a number of issues down the road. The result is half-baked solutions to problems that could have been avoided with the data-centric approach.

OneWebSQL knows that data is king and we believe in a data centric approach to database design. Data always outlives the application that created it. Data always needs to be accessible outside of a single application; there is always some external reporting, business intelligence, analytics, or other application that needs access to the same underlying data. Instead of trying to shoehorn the database into an object-oriented design, you should just use it the way it's supposed to be used.

Hibernate makes the assumption that your domain model acts as database access layer. We believe that this supposition is the source of some serious problems. First, it violates the principle of the separation of concerns. Secondly, it hides the database under the wrong abstraction, making it much harder to use, optimize, and debug than it should be.

In contrast, OneWebSQL encourages you to think about data processing while you code your application. The domain model is supposed to be the best possible programming abstraction of the business proble, so it needs to be carefully designed and coded by the programmer from the ground up. The database access layer, on the other hand, can and should be automatically generated from the database schema. It's the programmer's job to map the domain model methods to optimal data processing operations.

Hibernate tries to do these mappings automatically, but no default can work well in every scenario. You're better off making these decisions yourself right from the start, rather than using "smart" defaults, and then gradually turning them off in every piece of your application until you're doing them all manually anyway.

With OneWebSQL, you use SQL directly to interact with the database. Instead of manually coding JDBC, you can use pregenerated DAOs and leverage IDE code completion to write your SQL in an object-oriented, type-safe domain specific language (DSL).

Hibernate deals in domain models, entities, collections, and inheritance. OneWebSQL deals in tables, views, rows, and data.

Every design decision in OneWebSQL stems from this philosophy.

Schema Definition

Hibernate makes schema definition implicit. It's all hidden in XML files and annotations. It's hard to see a big picture of the database. It's hard for the database architect (DBA) to grasp what's going on, because DBAs don’t usually know Java. Hibernate locks the database schema in code-land, and sides with programmers on the schema ownership issue. None of the standard Hibernate tools, be it SchemaExport or SchemaUpdate, support anything more than the most basic database features. No vendor-specific extensions, no specialized indexes, no nothing.

OneWebSQL uses explicit schema definition. The database access layer code is generated straight from the ERD diagrams of the modeling tools. Everything is centralized; easy to inspect and modify by the database architect (DBA). If you need them, you can use all advanced database features and vendor-specific extensions. Consequently, schema ownership is explicit. DBAs and programmers can both edit the schema. Schema updates and data migrations are handled by industrial-strength database modeling tools.

Code Generation

Unless you use a third-party extension, Hibernate requires you to code all of your entities by hand. For all but "toy projects", there comes a time that you need greater control over schema design, such as the names of the tables, column types, or constraints. In Hibernate, all those are specified in Java annotations as strings. If you want to refer to them later in your queries, you need to make sure that you have typed them exactly the same.

OneWebSQL generates the database access layer straight from the database model. The generated code takes advantage of the 80–20 rule as applied to database interactions. The generated data access objects (DAO) provide the most frequently used SQL operations on the database tables: querying, filtering, sorting, CRUD modifications, aggregates, and more. If you ever need to change any table or column, you just need to regenerate the code, and, a few seconds later, all your DAOs reflect the new schema.

Querying

Querying data in Hibernate comes in multiple flavors: HQL, JPQL, JPA Criteria API, and legacy Hibernate Criteria API. Navigating associated entity collections is also a kind of querying, although limited in use. Finally, you can fall back to native SQL when HQL or Criteria API won't be efficient enough for your use case. Each of these tools has its strengths and weaknesses.

JPQL and HQL

JPQL and HQL are implementations of the same idea that differ only in minor details, so we'll discuss them together. HQL is basically stripped-down SQL with object-oriented extensions that better reflect Hibernate conceptual model of data.

The foremost shortcoming of HQL is that it is based on the notion of data as "nothing but objects". This makes some kinds of query, especially those involving unrelated (i.e. unassociated) entities, hard or even impossible to express clearly and efficiently. Thinking about data in terms of objects, instead of tables, seriously limits your freedom of what can be easily and efficiently queried, as compared with the full power of SQL.

Second, when you look at an HQL query, it's not immediately clear whether it will issue in a single SQL statement, make a JOIN, or have an n+1 select problem. Moreover, the performance can crumble unexpectedly if someone makes additional entity associations in seemingly unrelated parts of the code. The result is poor performance with no easy way to improve it. To learn more about how poor querying capabilities of Hibernate negatively affect performance, check out the section about lazy loading.

A third problem is that all HQL queries are Java strings. This means that the Java compiler can't analyze them and check the code for syntax errors or mistakes in entity names during compilation. All error checking is deferred until run time. The issue becomes more and more severe with each consecutive refactoring. To counter this, programmers have to put lots of effort into testing each new database change.

OneWebSQL design is explicit about the relational, remotely-accessed nature of databases. Accordingly, OneWebSQL uses SQL directly instead of an impoverished, artificial language like HQL. This gives programmers granular control over what is pulled from the database and yields high performance that is unmatched by Hibernate. You just query tables and get rows in return. No complex data transformations, no Java reflection, no lazy loading; it is as fast as getting data from a ResultSet.

Unlike HQL or JDBC, OneWebSQL uses type-safe, embedded DSL to build queries. Rather than strings, OneWebSQL uses an object-oriented query builder and constants generated from the database schema. The benefits are twofold.

First, it makes all queries compile-time safe. It's impossible to make syntax errors writing queries in embedded DSL, because invalid ones simply won't compile. DAOs and DSL queries use generated constants that describe names of tables and columns as well as their types. If any one of them changes, so does the corresponding constant. So if your schema changes, the compiler flags all incompatible modifications as errors.

Second, it speeds up programming. IDE autocompletes DSL methods and generated constants, so query programming is really fast. When you're done coding, your query has no syntax errors and no spelling errors in table or column names.

The Criteria API

Hibernate Criteria API brings compile-time type safety to Hibernate queries. Each query is represented as a data structure that resembles a query parse tree. Criteria querying is independent from HQL querying and there are subtle differences in what can be queried using one vs. the other.

Although well-intentioned, the Criteria API quickly becomes cumbersome to work with. The problem is that the programming interface is too low-level. When constructing queries, people tend to think in terms of what they want to fetch from the database, not in terms of syntactic query structure. But the Criteria API forces them to manually build a query parse tree equivalent from low-level blocks. For anything more complicated than a simple fetch with a few conditions, the code grows unnecessarily big and full of irrelevant details. The programmer shifts from thinking about data to managing accidental complexity of Criteria API. The query becomes unreadable, hard to change, and hard to reason about.

OneWebSQL embedded DSL is based on a fluent interface design pattern. It is a higher-level interface than the Criteria API. Using it produces shorter and more readable code that better reflects the intent and structure of plain-text SQL queries. Query modifications are easier and it still preserves the same level of type-safety as the Criteria API.

Native SQL

Hibernate offers an option to use the native SQL of the underlying database in cases when you need that last extra drop of performance out of your application. SQL gives you greater control over what data is retrieved from the database and provides full access to every database function available. Such power, however, comes with a substantial programming cost.

Every SQL query is represented by a string in Java code. This makes them totally opaque to the compiler, just like HQL queries. Syntax, naming, or type conversion errors can't be detected by the compiler, and result in exceptions thrown during program execution. On top of that Hibernate has to manage the names and data types used in native queries and map them to the entity and property names in Java code. This imposes additional constraints on the programmers who have to "pollute" their SQL queries with Hibernate-specific entity aliases, or supply extra mapping information in XML files or annotations. This problem is especially painful during database changes when every native query, mapping file, and annotation has to be reviewed, fixed and tested by the programmers. This makes working with SQL queries in Hibernate only marginally better than using JDBC directly.

OneWebSQL queries have the same expressiveness and performance as native SQL queries. Admittedly, they can be less readable than SQL queries written in plain text but that's the nature of every embedded Java DSL. However, the benefits of compile-time type safety more than pay for themselves. Building queries in embedded DSL is the best trade-off between performance, readability and type safety.

Querying Summary

Let's review the pros and cons of different approaches to querying the database. The table below compares Hibernate HQL/JPQL, the Criteria API, native SQL, and OneWebSQL DSL. Every method is described in terms of the following characteristics:

  • Performance — reflects degree of control offered to the programmer to build high-performing queries. The easier it is to get to the optimal performance of the database the better.
  • Type safety — indicates the amount of support offered by the language, compiler, and programming environment in query construction, error detection, and maintenance.
  • Readability — how easy it is to read the code and see the intent of the query. The less accidental complexity and the fewer irrelevant details, the better.
  Performance Type Safety Readability
HQL/JPQL low no high
Criteria API low yes low
Native SQL high no high
OneWebSQL DSL high yes medium

 

OneWebSQL's type-safe, object-oriented, embedded DSL gives you the best of both worlds: the power and expressiveness of SQL, coupled with the type safety of compiled code.

Inheritance

Hibernate natively supports entity inheritance. It provides three possible strategies for implementing inheritance in relational databases: table-per-class-hierarchy, table-per-subclass, and table-per-class. This feature is very convenient and easy to use, so programmers can quickly start coding applications using their familiar, object-oriented paradigm.

The problem with this approach is that object-oriented design is nothing like proper relational database design. Rows in a database table are not entities, they are statements of facts about the world. If a row is present, that indicates that the fact is true, and, conversely, its absence means the fact is false. Each object can group data from multiple tables, and each table can store data from multiple objects of different classes. Trying to bend tables to serve as one-to-one containers for entities is destined to fail. You end up with one of the three suboptimal strategies that seem more like relational design anti-patterns. All three require complicated SQL queries to fetch and modify the data that perform poorly in most of the use cases.

On a less fundamental and more technical level, Hibernate inheritance interacts badly with lazy loading. Hibernate requires the programmer to use the instanceof operator to implement the equals() method. If you don't, you break data consistency. If you do, you break the symmetry requirement of the equals() contract. Moreover, in SQL the equality of rows is symmetric, so Hibernate entities have different equality semantics than database rows do.

In contrast, OneWebSQL does not support inheritance directly. If you start with the relational notion of data, it's easy to design simple and well performing database schema. It's equally easy to create queries or views to store entities in the tables. If your use case lends itself to one of the three inheritance mapping strategies, you have to design the tables yourself, and OneWebSQL will generate the corresponding DAOs and DTOs for your application, and then you'll need to manage inheritance yourself by manipulating specific rows in the application. The recommended OneWebSQL approach, however, is to design a good database schema, and then to create views optimized for querying data in your application. Data modifications are made by making SQL inserts, updates, and deletes in the appropriate tables.

Lazy Loading

In Hibernate, all associations between entities are lazy by default. If it wasn't for the lazy loading, Hibernate performance would completely break down.

Why? Without it, for every loaded entity, Hibernate would also load all associated entities, all entities associated with associated entities, and so on ad infinitum. A simple query could fetch a potentially huge number of objects, even if only a single field from the root entity is needed. To avoid this, Hibernate loads entity objects without their associated entities. Only when the application code needs them does Hibernate retrieve the data from the database.

Lazy loading has unavoidable performance problems.

If you need to access a lot of related entities, Hibernate makes a new database call to load each one of them. Because remote database calls are expensive such behavior is very inefficient. It's much more cost-effective to retrieve all associated entities in a single database query. This undesired behavior is known as "n+1 select problem".

To mitigate it, Hibernate lets the programmer declare which associations should be fetched eagerly. But this brings us back to the original problem. If there are too many associated entities, they won't fit in the application memory. And even if they do fit, Hibernate still retrieves too much data, wasting network bandwidth and memory, which is the main cause of performance problems. This can be alleviated somewhat by finding the proper combination of different knobs and levers used by Hibernate to tune lazy loading: fetch annotations, fetching associated entities in batches, setting the maximum fetch depth, lazy property fetching, etc.

All the above mechanisms, however, are crude and limited in use, because they are either global either for an application or for a given association. But no single fetching strategy can perform well in all possible usage scenarios. If you need greater control, or want to squeeze the last ounce of performance from the database, you have to resort to using HQL. This, however, brings us full circle back to where we started from. Why use HQL then, the poor man's subset of "the real deal" SQL?

All OneWebSQL operations are eager. There is no need for lazy loading because OneWebSQL uses SQL directly. Every operation maps one-to-one to an SQL statement. This gives the programmer full control over what is pulled across the network, so only the minimum amount of data needed for a given situation is fetched. Properly structured SQL queries yield performance unmatched by Hibernate. All the data you need, and only the data you need, is retrieved from the database, exactly when you need it.

Hibernate lazy loading is cumbersome and unpredictable.

Hibernate uses the concept of a session to maintain data consistency across multiple operations. It corresponds roughly to a database transaction, but also has a few other functions. One of them is the lazy loading of associated entities. Entities loaded by Hibernate are attached to a session and exposed via proxies. If the application requests some lazily-loaded entity, it is retrieved from the database using the attached session. This way, Hibernate ensures that all lazily-loaded entities are accessed within the same transaction. It's absolutely required to maintain data consistency within the application.

Such an arrangement, however, makes lazy loading of entities break outside of the session context. If you try to access a lazily-loaded entity after your session closes, Hibernate throws a LazyInitializationException. There are two solutions to this problem. You can either keep the session open for the duration of the business operation (so-called "open session in view" pattern) or manually preload all lazily-loaded entities before closing the session. Either way, you need to restructure your code to account for the lazy loading whims of Hibernate.

Hibernate exposes lazily-loaded entities via proxies. The problem with proxies is that, unless you take special care in your code, they mess with standard object semantics, compared with plain, old Java beans:

  • You can't access entity fields directly, but instead have to use setter and getter methods. Otherwise, Hibernate won't lazily load associated entities. If proxies are Java dynamic proxies, then instance fields will always be null, because the entity data is stored in a delegate object, accessible only via getter and setter methods. It's especially easy to fall into this trap when you implement equals() and hashCode() methods.
  • You can't compare proxies directly, but instead have to use the instanceof operator or go through a special method from HibernateProxyHelper class. Therefore you can't implement equals() and hashCode() methods conventionally, but have to do it the "Hibernate way". You need to know the peculiarities of how Hibernate works internally to code it properly. Otherwise, you won't be able to safely use entities outside of a session. If you try to reattach them later, you'll likely to encounter "strange" Hibernate behavior, such as duplicate objects in collections.
  • You can't serialize an entity without manually preloading all associated entities. This time, however, Hibernate won't throw an exception, but will silently serialize the original entity WITHOUT entities that weren't loaded earlier. To fix this, you have to manually initialize all lazily-loaded entities before serialization.

The amount of work needed to work out all the details associated with lazy loading, its complex interactions, and its impact on the application is huge. So big that experienced programmers that know Hibernate really well are said to have a "PhD in Hibernate". Having less then a good grasp of what's going on inside Hibernate leads to unpredictable behavior and performance problems.

OneWebSQL, in comparison, does not need all that lazy loading infrastructure. All records are retrieved within the context of a single database transaction. Every object returned by OneWebSQL is lightweight, fully constructed, and ready to use just like any other plain Java bean. They have working equals() and hashCode() out of the box, can safely be put in collections, compared, and serialized. All data requested by a programmer is already there, so there's no need to fetch additional data at some unspecified time later. Consequently, there is no error situation equivalent to LazyInitializationException.

Lazy Loading Summary

Lazy loading, even with its numerous configuration mechanisms, doesn't provide enough control to match the performance of SQL. Hibernate's misguided notion of data as "nothing but objects" lacks the necessary capabilities to effectively query a database over a network where each remote call is expensive. Lazy loading is a complex, and is an inadequate substitute for Hibernate's poor querying capabilities; it actually creates more problems than it solves.

OneWebSQL, in contrast, sides with relational databases on how data should be designed, accessed, and modified. OneWebSQL uses SQL directly, enabling granular control over what is fetched from the database and when it is fetched. This approach reflects the reality of accessing a databases over a network, where each data transfer is costly. It brings unmatched performance and simplifies the pgorammer's conceptual framework. The database access infrastructure is lightweight, predictable, and fast.

Collections

Hibernate uses the Java Collections API to represent different types of associations between mapped entities, as well as collections of embedded elements. This was a poor design decision, because the standard Collections API was designed with different goals and trade-offs in mind than the Hibernate use-cases. The Collections API operates on fast, in-memory objects contained within a single process. Hibernate uses it to access remote, orders-of-magnitude slower, persistent data. The result is that Hibernate association mappings are the most difficult thing to implement correctly.

To maintain the illusion of data as "nothing but objects", Hibernate implements lazy loading underneath the Collections API. This undesirable overloading of concepts makes it hard for the programmers to efficiently and predictably program with Hibernate collections. Although they are visually similar to Java collections, they have totally different semantics and performance. Suddenly what seemed like a simple method call throws a LazyInitializationException or makes the application grind to a halt. The programmer can't easily know whether a collection access will result in a single SQL query, a join, or the dreaded n+1 select problem.

OneWebSQL, in comparison, uses standard, in-memory Java collections directly to represent the results of SQL queries. They are fast, predictable, and have familiar performance characteristics. OneWebSQL fully populates every collection eagerly, so when it's accessed in a user code, it's complete and ready to use.

Conceptually, Hibernate makes an unnecessary distinction between collections of "entities proper" and collections of so-called "embedded elements". Both of these are represented by tables in a database, but in Hibernate they behave and are treated quite differently. They are annotated differently, have different life cycles, and have different querying capabilities in HQL. Entities are navigable, whereas collections of elements are not. Entities can be fetch joined eagerly; embedded elements cannot. Working with collections of associated entities requires the programmer to specify relation multiplicity (one-to-one, one-to-many, many-to-one, and many-to-many) and direction (unidirectional or bidirectional). Collections of embedded elements, however, are simple tables related by a foreign key to the table of the embedding entity.

The standard mapping combinations offered by Hibernate often result in nonstandard, non-obvious, and suboptimal table structures as seen from the perspective of the database. To properly specify entity mappings requires prior knowledge of table record counts and data access patterns in the application. Without it, the performance will plummet. All of this complexity exists to shoehorn a few simple relational concepts into an object-oriented world view. On the database level, all these are still simply tables that reference each other via foreign keys.

OneWebSQL works with tables, foreign keys, and joins directly. You simply JOIN whatever tables you need, and you don't have to decide beforehand the way the application will access your data, nor worry that if you guess wrong you'll pay through the nose in terms of performance or querying power. Instead of learning lots of specific cases for how to map your entities to database tables, just design the tables and declare their foreign keys. Such an approach is capable of handling all the distinct types of mappings supported by Hibernate.

Finally, Hibernate collections, because they implement lazy loading, have serious problems with serialization. To serialize a Hibernate collection, you need to manually preload all of the data in advance, or serialization will fail silently. If you don't, you'll end up with a broken, partially-populated collection where only the elements loaded before serialization are present. Moreover, serialization leaves the original entities still managed in a session. This means that a deserialized collection contains detached copies of the entities, and you'll need to reattach them with a Hibernate session in the application.

OneWebSQL uses plain old Java objects (POJOs) to represent database data. All generated objects are serializable. All collections created by OneWebSQL are eager and fully populated before they are returned to the application code. They are completely safe and predictable for the purposes of serialization.

Transactions

Hibernate makes the assumption that entities live longer than a single database transaction. Operations on a set of related entities are grouped into sessions. Sessions represent larger units of work and can span multiple transactions. The main role of the session is to manage entity identities. This means that for each entity within a session, there is only one object instance that represents a given chunk of data from the database. If there were two distinct objects that represented the same entity, then this could lead to errors from data aliasing. Hibernate is vulnerable to entity aliasing effects; hence, it has to manage entity identities, and needs the session abstraction.

Identity management is complex and expensive, so Hibernate takes on additional responsibilities to accomplish this task. It caches queries that return entities for the lifespan of a session. It loads associated entities lazily within a session. It manages the life cycle of entities that need to be detached from or attached to a session. On top of that, Hibernate must coordinate the session with their underlying database transactions.

The result is that session abstraction is leaky, badly performing, and unpredictable. It's easy to trip the wire of LazyInitializationException. It's hard to control when the SQL statements are issued because Hibernate batches most of the operations on transaction commit. It's hard to implement equals() method properly without resorting to a "business equality" concept ("candidate key" in database terms). It's impossible to maintain the "single entity instance within a session" illusion in the face of serialization. That's why you have to manage the life cycles of entities and "reattach" them after deserialization. If a Hibernate session throws an exception, it becomes unusable and has to be discarded. Unlike with database transactions, you can't recover all the work up to the moment of the exception, but must start the whole operation from the beginning.

OneWebSQL assumes that objects retrieved from the database are short-lived and are just a window into the current database data, lasting for the scope of a single transaction. The typical use case of OneWebSQL objects is aligned with SQL recommended use. It consists of retrieving and manipulating a minimal amount of data from the database, as quickly as possible, within the scope of a single database transaction.

OneWebSQL does not manage object identities and uses value equality to compare objects. If two DTOs have the same fields, they are considered equal, just as in the database where tuples of the same size with the same values are equal. There can be any number of objects that represent the same table row and, from the OneWebSQL point of view, they are perfectly interchangeable. There is no need to cache and resolve objects within the scope of a session.

OneWebSQL does not batch or defer SQL. Statements and queries are issued when the programmer calls the appropriate method. The objects returned to the application code are simple, fully constructed, serializable Java beans. There is no equivalent of LazyInitializationException, no hard-to-avoid locking problems due to SQL batching, and no need to handle serialization in any special way.

Consequently, OneWebSQL does not need the session concept and the whole infrastructure to support it. Programmers work directly with the transactional facilities of the underlying database, without the need for unnecessary abstraction. Sessions, in fact, poorly duplicate the job of the database. Caching, isolation, and consistency are handled directly by the database and JDBC or JTA transactions. The database can provide these for more than a single application, without the cost of sending data over the network.

In OneWebSQL, longer units of work are maintained using application-specific mechanisms (e.g., via HTTP session). If there is a problem with an SQL statement, the application can reasonably recover because JDBC and JTA transactions, unlike Hibernate sessions, are not invalidated by SQL exceptions.

Locking

Hibernate supports optimistic and pessimistic locking. Optimistic locking is based on versioning or timestamping and implemented entirely in Hibernate. Pessimistic locking is delegated to the underlying database engine and is accessible via a unified interface provided by Hibernate.

Locking problems can result in poor performance, and the application ends up hanging due to deadlocks. Debugging and fixing them requires deep knowledge of the underlying database engine, as well as precise insight into access patterns and interactions between transactions competing for the same data.

From this point of view, Hibernate optimistic locking adds an unnecessary layer of complexity. It has its share of problems like query overhead, timestamp support in the database, and lack of timestamp precision, that can lead to concurrency bugs. To properly use optimistic locking, one needs to know precisely how it works and how it interacts with the native locking of the underlying database. In Hibernate, some SQL statements go straight to the database, some are resolved using first-level cache, some are deferred until the commit time, and some support optimistic locking. It's hard to see what exactly happens under the hood.

Fixing locking problems typically requires restructuring SQL operations in the application. Given the lack of control over what SQL operations are issued by Hibernate and when, it's difficult to change the application code to fix the locking problem. Common programming practice is that locking problems are fixed by abandoning Hibernate's locking management and implementing crucial database operations in native SQL.

Hibernate's pessimistic locking, although well-intentioned, does not give any significant advantages over working with native database engine locking modes directly. The programmer still needs to know how abstract Hibernate lock modes map to the underlying database locking modes, how they behave, and how to structure the code accordingly.

OneWebSQL supports only native locking provided by the underlying database engine. There is neither support for optimistic locking nor specialized locking interface to unify different, often incompatible, locking modes of different databases. All necessary locking is performed by the database within the context of a JDBC or JTA transaction.

We believe that in the case of a serious locking problem, the programmer is best served when she can specify locking directly in the database engine's SQL dialect and restructure the code appropriately. If the application requires some kind of optimistic locking, it's possible to implement this above OneWebSQL's building blocks, taking into account the circumstances of the specific application and the database.

BLOB/CLOB Handling

Hibernate provides two ways to use LOB data in the application. Large object columns can be accessed directly via the java.sql.Blob and java.sql.Clob interfaces, or they can be materialized, as byte arrays or as strings, respectively. Because working with JDBC interfaces can be inconvenient, and support for them in the drivers is often patchy, Hibernate users opt for the materialized variants. Materialized versions behave consistently and can be used safely outside of a transaction, but at the additional cost of greater memory consumption.

OneWebSQL treats LOB columns specially. They aren't part of the generated DTOs, but each has its own set of dedicated manipulation methods in the DAO. One subset of those methods provides exception-safe and close-safe operations on BLOB and CLOB columns. The application provides the appropriate stream (reader/writer for the CLOB column), and OneWebSQL automatically pumps data between streams, taking care of the necessary cleanup and exception handling. Another subset of methods is used to materialize LOB columns and to update the database with data from materialized objects like strings and byte arrays. OneWebSQL works with streams by default. Methods that operate on materialized variants are for programmer convenience only. All methods for a given column are fully interoperable.

Caching

Hibernate is chatty. It commonly makes a few SQL calls for each method call. To reduce the number of SQL roundtrips and to boost performance, Hibernate provides its complex caching infrastructure. There are four types of cache in Hibernate: session cache, query cache, update timestamps cache, and second-level (L2) cache. Each has a different purpose and different impact on the overall performance.

Session cache

Session cache is used to resolve entity identities. Every time the application requests a database entity, Hibernate returns the same object instance. This behavior is required to avoid the aliasing effect to which Hibernate is vulnerable. Every object returned from a lookup method, the Criteria API, or HQL query is resolved in the session cache. Session is almost nothing about caching and everything about ensuring consistency. Speeding up entity lookups by identifier is an unintended side effect of the implementation, not the explicit design goal. Treating sessions as cache is a primary source of trouble for Hibernate programmers.

Entity caching in a session can't be disabled. If your application uses too many entities within a single session, it will fail with OutOfMemoryException. You can't turn it off or configure some sensible automatic eviction policy like LRU. Instead, you have to decide in the application which entities you don't need any more, and evict them manually. Moreover, HQL bulk updates can lead to inconsistencies in the session cache. Hibernate does not know what entities were updated by the bulk operation, so the programmer has to manually evict all the affected entities. It's a hard problem that database implementers have spent years working on. It's equally hard for the programmer. When it's not obvious what to evict, the only option is to clear the whole session, or to even put the bulk operation in a different session altogether.

Query cache

Session cache speeds up looking up entities by their identifiers. It's easy to check whether the relevant entity is already in the session, and then return it as needed. This is not so easy with queries, because Hibernate can't tell what entities a particular query will return. It has to execute the query every time, and then, only after the database returns the data, resolve entities with the session cache. Query cache, and its companion, the update timestamps cache, were introduced in Hibernate to avoid this additional database hit on every query.

Hibernate can guarantee that query results are valid as long as the underlying database tables were not modified. To ensure consistency, it records all table modification times in the timestamps cache. Every time results are retrieved from the query cache, Hibernate has to check the timestamps cache for all tables used in a query. If any table was modified since the time the query was cached, the result is removed from the cache and the lookup is a miss. This implementation has a few problems.

Query cache invalidation is very crude and the query results are removed far too often from the cache, even if they could still possibly be valid. In effect, not many SQL queries are saved.

Timestamps cache has a single coarse lock that is used for all database modifications and for every lookup through the query cache. In applications with a high volume of transactions, this is another source of lock contention, even if query caching is disabled.

Query caching works differently for different querying mechanisms. It is most useful for queries based on a natural ID (unique or candidate key in database lingo). This, however, can be supplied only by the Criteria API, not by other means, like a Query object or HQL query.

Each query cache entry is expensive in terms of memory. Its key contains the query string itself, all bind parameters, and the query retrieval time. This cost becomes significant in applications that cache a lot of queries that return a single entity or a small number of entities.

Finally, to enable query caching, the programmer needs to modify the source code and explicitly declare whether a particular query is cacheable, and then recompile and redeploy the application.

Given the above, query cache is rarely useful in practice, and thus disabled by default in Hibernate. From the database point of view, the Hibernate approach to query caching is completely misguided. It's the role of the database cache to solve the problem of caching query results and to guarantee their consistency.

Second-level Cache

Second-level (L2) cache is shared between sessions. Rather than the entities themselves, it stores raw, immutable data retrieved from the database. This design protects the cache from unwanted, concurrent modifications of cached data by multiple, competing Hibernate sessions. Whenever Hibernate creates an entity, instead of issuing an SQL query, it will try to reuse the data from the second-level cache. Each session has its own entity instance that shares underlying data with entity instances from other sessions.

Hibernate L2 cache is closest to the user notion of application cache. It provides caching for business entities and saves some database operations. However, its usage, configuration, and tuning is still far from optimal. This is because all L2 cache implementation details have to be managed explicitly by the programmer.

To properly use and tune the L2 cache, the programmer needs to know how the Hibernate second-level cache works internally. One needs to know how the raw data is used to reconstruct the entities. What are cache concurrency strategies and which one to use? Which concurrency strategy is provided by which external cache provider? What's the difference between caching data for entities and caching data for associations? How to define, use, and invalidate cache regions? What are the interactions between query cache, timestamp cache, and second-level cache, and how to specify them using one of the five available cache modes? How does an external applications update the database, how does it impacts Hibernate caching, and how to account for it in the code?

When you dig deeper and start to look closely at data structures of the L2 cache, they eerily resemble database tables stored in the application memory. But caching table data in memory is exactly the job of the database. Hibernate second-level cache, in comparison, needs much more effort and hand-holding from the programmer to work efficiently. When you look at the second-level cache as cache for business entities, it's still less optimal than explicit caching and invalidation in the client code. Every time Hibernate returns an entity for the new session, it has to be reconstructed from the raw data. Every modification needs to go through timestamps and L2 cache. Conversely, the user code that retrieves or invalidates a business entity in the application cache is simple, fast, and precise. It's just a simple hash map lookup or removal.

Hibernate L2 cache tries to reach two different performance goals simultaneously and, as a result, excels at neither of them. One goal is to curb the excessive data retrieval from the database caused by poor Hibernate's querying capabilities. OneWebSQL uses SQL to query the minimum amount of data necessary before sending it over the network. This approach is economical enough to eliminate the mandatory caching of all database operations. The second purpose of L2 cache is to provide generic business entity caching. To protect cached data from concurrent modification, the cache stores raw data instead of entities, and every time an entity is requested, it needs to be constructed anew. It is suboptimal performance-wise and hard to use from the user code. In short, Hibernate's second-level cache fulfills both of its objectives poorly.

Caching in OneWebSQL

The only kind of caching that OneWebSQL provides out of the box is caching of reference tables. Reference tables tend to be strongly constant over time and their row data is commonly used in conditional expressions and switch statements in the application. They are called dictionary tables in OneWebSQL. For each dictionary table, OneWebSQL generates a Java interface that contains a constant for each row from the table. These constants contain data from the row and can be used directly in the code without the need to retrieve the data from the database.

OneWebSQL does not need session cache, query cache, and timestamps cache equivalents. These mechanisms were introduced to mitigate performance problems of Hibernate. For example, multiple copies of data returned by OneWebSQL are fully interchangeable and, accordingly, there is no need for a session cache-like infrastructure to resolve entity identities. There is no query cache and no timestamps cache in OneWebSQL because these components misguidedly duplicate the task for which databases are optimized. The primary function of the database is to answer queries as fast as possible, while maintaining data consistency. It is impossibly hard to beat its transaction management and sophisticated database caching using the Hibernate's approach. OneWebSQL relies on the decades of engineering effort that went into database implementations, and leaves consistency management and low-level caching to the database.

OneWebSQL does not have generic second-level cache as implemented in Hibernate. Caching business entities explicitly in the application gives the programmer better control over caching policy and performance. OneWebSQL supports application-specific second-level caching via external caching libraries (e.g., Guava in-memory caches or JBoss Cache). Objects returned by OneWebSQL are lightweight, independent, and serializable, and can be used directly with any cache implementation. For maximum performance, the application usually needs to cache some higher-level, precomputed data structures instead of collections of rows. OneWebSQL DTOs can be used perfectly as part of a larger cached entity. The programmer can implement any desired caching policy explicitly, taking into account possible database modifications made by external applications.

Performance

OneWebSQL takes a relational stance on the concept of data and, therefore, makes different design trade-offs than Hibernate. There are three key areas in which different design decisions make OneWebSQL much faster in comparison. First is its choice of querying mechanism, second is (lack of) lazy loading, and third is its data identity management and caching.

Querying

Hibernate provides three different querying mechanisms. On top of that, it fetches the data from the database implicitly every time user code uses entity associations. All of these mechanisms are crude and have non-obvious performance characteristics. Using them ensnares the programmer in a constant guessing game of what SQL statements Hibernate issued this time and why. Tuning Hibernate queries is hard, sometimes downright impossible, without resorting to native SQL.

OneWebSQL, on the other hand, uses SQL to query data from the database. Fetching data from the database is an expensive remote call for the application. Minimizing the number of database queries and the amount the data sent is the only way to optimize performance. Querying data with SQL gives the programmer full control over what data is sent from the database. One can specify the minimum amount of data required in the application, and only that data will be transferred over the network. For a more detailed description of differences between Hibernate and OneWebSQL, refer to the Querying section of this document.

Lazy loading

Hibernate uses lazy loading to cut down on the amount of data loaded from the database via entity associations. Each entity can potentially be associated with a large number of other entities. In most cases, the user code will use only a few of them, or none at all. Hibernate, however, can't tell in advance which associated entities, if any, will be used. So it delays data retrieval until a particular association is used. This way, Hibernate maintains the illusion that all data is a network of interconnected objects. The cost of this illusion, nevertheless, is high. Lazy loading can only be configured globally for a given association, despite the fact that it's rare for one configuration to satisfy all entity usage scenarios. Either too much data is fetched from the database, or the infamous n+1 select problem appears. Lazy loading behavior is unobvious and hard to tune for maximum performance. It is no substitute for strong querying capabilities.

All OneWebSQL operations are eager. There is a one-to-one correspondence between OneWebSQL methods and SQL commands. The programmer decides exactly what data is retrieved from the database and when it is retrieved. OneWebSQL does not defer or batch operations until commit time. Database interactions are explicit, making performance tuning relatively easy and straightforward. To learn in-depth how OneWebSQL and Hibernate differ in their approach to lazy loading, consult section titled Lazy loading earlier in this document.

Caching

Hibernate solves most of its performance problems by throwing more caching at them. Instead of relying on database strengths, Hibernate treats it as a dumb data store. The Hibernate conceptual, object-oriented model of data precludes efficient use of SQL. Thus, the database is used suboptimally and is underutilized. When inevitable performance problems appear, Hibernate adds another layer of caching. Its caching infrastructure is complex and only partially solves some of the performance problems. Moreover, a bird's eye view of the cache internals reveals that it imitates the relational data model in memory. Hibernate developers learned the lesson that cache invalidation is hard. It took database vendors years to implement and to optimize caching while maintaining the transactional integrity of data and high performance. The problem with the Hibernate's approach is that some performance issues can't be solved via caching alone. The key to making fast programs is to make them do as little as possible and only THEN do it as fast as possible.

OneWebSQL uses caching sparingly. Most of the time, it relies on the database to cache data. Instead of fetching too much data and then caching it to recover some performance on subsequent operations, we fetch less data in the first place. This makes all database operations fast, predictable, and frugal. When the need to cache something in the user code arises, OneWebSQL DTOs are easier to work with than Hibernate entities to implement application-specific caching policy. For more information about Hibernate and OneWebSQL's differing approaches to caching, refer to the Caching section of this document.

Schema maintenance

Database schema definition is spread over the whole code base in Hibernate. It's defined in HBM mapping files and annotations. This design makes it hard to grasp the big picture of the tables in the database and the relations between them. The more entities in the application, the more severe the problem. When the schema needs to change, the programmer has to manually find and update the application source code, be it mapping files or annotations. Database experts don't work with the application code or programming IDEs. They work with the database schema directly using specialized tools. Hibernate's support for schema migration or database-specific features is very limited in comparison to the tools provided by the database vendors. Therefore, it's hard to maintain and quickly adapt applications that use more than a few simple tables.

OneWebSQL relies on external, industrial-strength tools to do database schema design and maintenance. It reads the external schema model and generates the source code that exactly matches the database. The schema is explicitly edited via graphical tools, where it's easy to see the impact of any change made to the database tables. Every time the schema is updated, the external tools are used to handle database migration, and OneWebSQL regenerates the source code from scratch. Whenever an incompatible change is introduced, compiling the newly-generated code fails with an error. Programmers can then use an IDE to jump straight to the code that requires a fix. This considerably speeds up iterations between schema updates and code changes.

Summary

Hibernate tries to insulate the programmer from the database. It builds a lot of infrastructure to support the elaborate illusion that data stored in the database are nothing but objects, but the cost of using it quickly becomes prohibitively high. Hibernate promotes coding patterns that do not match the reality of the remote and expensive nature of each database call. The abstraction is leaky, and every time something breaks or performs poorly, you need to take a look at what's going on under the hood. As the application grows, more problems appear, and it's increasingly difficult to refactor the database and the code to keep up with changing business requirements. The result is that programmers fall back to native SQL or the development costs skyrocket.

OneWebSQL takes a data-centric approach to the database and application design. It is supports and actively promotes the recommended usage patterns for interaction with relational databases. OneWebSQL takes away the most error-prone parts of the JDBC programming and replaces them with fast, predictable, and safe alternatives. OneWebSQL tries to get out of the way of the programmer. Everything is explicit and exposed via APIs, so one can easily change or adapt existing mechanisms for application-specific needs. This gives the programmer full control over interactions between the application and the database. The resulting code is easy to debug, contains fewer errors, and is easier to tune for performance.

Comments powered by Disqus

Table of content