Why is this topic worth discussion

It all started from us trying to setup Hibernate ehcache in our big project with a lot of entities to improve the performance. Eventually that took a lot of time because we ran into a bunch of issues and to solve some of them we had to do a bit of refactoring, to solve some other we spent a lot of time setting it up properly. Yes, many things you will find in documentation but it’s not always on the surface and not always seen immediately.

For whom it may be interesting

Developers trying to set up Hibernate caching in their app who are curious about possible pitfalls or already having issues and wondering why isn’t it performing as expected will find this article helpful. Actually I wish it was written for me before I started setting it up 🙂

Experimental setup

To demonstrate certain things I’ve created a simple Java project where I am using:

  • Hibernate version: 5.4.11.Final
  • Ehcache version: 3.8.1
  • PostgreSQL version: 10

My data structure is also simple:

entities

In my application I have configured two caches, one is for entity caching: cachedEntities and a query cache: queryCache. Of course I’ve activated both caches in my persistence.xml:

<property name="hibernate.cache.use_second_level_cache" value="true"/>
<property name="hibernate.cache.use_query_cache" value="true"/>

and enabled hibernate query logging to see what exactly is happening.

In Java code I’ve annotated all three entities with

@Cache(usage = CacheConcurrencyStrategy.READ_WRITE, region = "cachedEntities")

Following the data structure, Employee has a @ManyToOne reference to Department. Department has a @ManyToOnereference to Company. Department has @OneToMany employees. Company has @OneToMany departments.

We will play a bit with this configuration in the process.

Catch #1

eagerly fetching collections? brrr

That may sound like a good idea in the beginning: fetch some collections eagerly so in the beginning your application will for sure spend a lot of time on fetching it all but then it will be faster in the runtime because the data is already there…​ But. Eager fetch type is generally considered as a bad practice. It may result in many uncontrolled queries being issued to the database.

But as we are discussing the caching topic here, let’s see how fetch type is related to caching. Actually, combining eager fetch type of collections and caching is absolutely useless. Let’s look at our example. Let’s say all entities are now cacheable. We will be experimenting with Company entity, which has the set of departments in it:

@OneToMany(fetch = FetchType.LAZY, mappedBy = "company")
private Set<Department> departments = new HashSet<>(0);

Now let’s run the query which has query caching properly set up:

entityManager.createQuery("select company from Company company where company.id < 100")
  .setHint(QueryHints.HINT_CACHEABLE, "true")
  .setHint(QueryHints.HINT_CACHE_REGION, "queryCache")
  .getResultList();

Like that no weird queries will be issued to the database when the caching properly works. But be careful: once you make a FetchType.EAGER it will change the game.

First run:

Hibernate:
select company0_.id as id1_0_,
       company0_.address as address2_0_,
       company0_.description as descript3_0_,
       company0_.name as name4_0_
from company company0_
where company0_.id<100

Hibernate:
select department0_.company as company4_1_0_,
       department0_.id as id1_1_0_,
       department0_.id as id1_1_1_,
       department0_.company as company4_1_1_,
       department0_.name as name2_1_1_,
       department0_.occupation as occupati3_1_1_
from department department0_
where department0_.company=?

Hibernate:
select department0_.company as company4_1_0_,
       department0_.id as id1_1_0_,
       department0_.id as id1_1_1_,
       department0_.company as company4_1_1_,
       department0_.name as name2_1_1_,
       department0_.occupation as occupati3_1_1_
from department department0_
where department0_.company=?
...

So, first we get all company data and then use company id’s to query all departments belonging to every company. Why is this happening? Our query didn’t contain anything related to departments, but they are queried because of the EAGER fetch type and it will do it EVERY time you issue your query for getting just companies. Even after the query result is cached.

Second run:

Hibernate:
select department0_.company as company4_1_0_,
       department0_.id as id1_1_0_,
       department0_.id as id1_1_1_,
       department0_.company as company4_1_1_, 
       department0_.name as name2_1_1_, 
       department0_.occupation as occupati3_1_1_
from department department0_
where department0_.company=?

Hibernate:
select department0_.company as company4_1_0_,
       department0_.id as id1_1_0_, 
       department0_.id as id1_1_1_, 
       department0_.company as company4_1_1_, 
       department0_.name as name2_1_1_, 
       department0_.occupation as occupati3_1_1_
from department department0_
where department0_.company=?
...

The same as before, but the first query doesn’t run again because it’s result was cached. The queries we see now are running because of eager fetch type. If Department had the set of employees with FetchType.EAGER as well then following the chain of eager fetches, employees would be queried too.

You may wonder why our properly set up entity cache doesn’t help? Because we see in the log the queries which get departments by company id’s. Entity cache won’t be helpful here, it would be helpful for querying departments by department id’s.

So if you have FetchType.EAGER for collection (you must have a strong reason for that which is unlikely) don’t expect that caching will be much of a help.

Takeaway:

Avoid using EAGER fetch type for collections. Prefer making queries instead, carefully adjusting them to your purpose and caching query results in the right way (the wrong ways are explained in Catch #3 and Catch #4).

Catch #2

what about single-valued associations?

With EAGER fetch type every time you are fetching your entity, all the associated entities will be fetched too. Single-valued associations are easier to control so @ManyToOne and @OneToOne are EAGER by default. But you should still be careful otherwise caching won’t save you from repetitive queries to the database.

Let’s try to get an Employee by id:

entityManager.find(Employee.class, id);

First time it logs this query to DB:

Hibernate:
select *
from employee employee0_
  left outer join department department1_ on employee0_.department=department1_.id
  left outer join company company2_ on department1_.company=company2_.id where employee0_.id=?

We can see it actually queries employee, department and company tables because Employee has association to Department and Department – to Company which are by default eagerly fetched.

Second time it takes all values from the cache so it logs no queries to the DB which is exactly what we expect because we’ve marked all them as cacheable.

Now let’s remove @Cache annotation from Department. It means that this entity won’t be cached in the entity cache. And we try to find Employee by id again.

First run:

Hibernate:
select *
from employee employee0_
  left outer join department department1_ on employee0_.department=department1_.id
  left outer join company company2_ on department1_.company=company2_.id where employee0_.id=?

Second run:

Hibernate:
select *
from department department0_
left outer join company company1_ on department0_.company=company1_.id
where department0_.id=?

First time it queries employee, department and company as normal.

The second time it queries department and company tables.

So yes we cached Employee properly but we had cached only an id of department which an employee belongs to. Means, having this id, our application can either get an entity by id from an entity cache or it will go to the database again to gather missing data. Our department wasn’t ever put to the entity cache so our app went to the DB.

Takeaway:

When you want to cache an entity, check all …​ToOne relations which are eagerly fetched by default. You either want to make them fetched lazily or you can also cache it’s relation entities otherwise the queries to DB will be made to fetch missing data. Whatever works better for your project & data.

Catch #3 (my favourite)

query caching is killing the application performance

Let’s change the set up for our entities, so they are not stored in the entity cache. Now we are going to use the query cache. To set up a query caching you need to explicitly add hints to enable query caching for each query and optionally specify the region where it is cached.

Let’s say we have a simple query that queries the companies:

entityManager.createQuery("select company from Company company where company.id < 100")
  .setHint(QueryHints.HINT_CACHEABLE, "true")
  .setHint(QueryHints.HINT_CACHE_REGION, "queryCache")
  .getResultList();

Let’s run this.

First run:

Hibernate:
select company0_.id as id1_0_,
       company0_.address as address2_0_,
       company0_.description as descript3_0_,
       company0_.name as name4_0_
from company company0_
where company0_.id<100

Looks like a nice little cute query, right? Let’s run it again.

Second run:

Hibernate:
select company0_.id as id1_0_0_, 
       company0_.address as address2_0_0_,
       company0_.description as descript3_0_0_,
       company0_.name as name4_0_0_
from company company0_
where company0_.id=?

Hibernate:
select company0_.id as id1_0_0_, 
       company0_.address as address2_0_0_,
       company0_.description as descript3_0_0_,
       company0_.name as name4_0_0_
from company company0_
where company0_.id=?

Hibernate:
select company0_.id as id1_0_0_, 
       company0_.address as address2_0_0_,
       company0_.description as descript3_0_0_,
       company0_.name as name4_0_0_
from company company0_
where company0_.id=?

Hibernate:
select company0_.id as id1_0_0_, 
       company0_.address as address2_0_0_,
       company0_.description as descript3_0_0_,
       company0_.name as name4_0_0_
from company company0_
where company0_.id=?
...

What? Now we have lots of queries instead of just one! So our query caching actually worsens our performance. Query caching caches only id’s which are then used to get the rest of entity data, either from entity cache or from the database. To use query cache we MUST use an entity cache too. Now let’s annotate Company with @Cache and try again. First run looks exactly the same, the second time there were no queries issued to the DB. Perfect!

Takeaway:

Use entity cache if you are using query cache otherwise query caching will be a very doubtful performance improvement.

Catch #4

queries with parameters: overcache

It may be too obvious now that queries with parameters are not really well compatible with query caching unless you often run them with the same values in your application. That can be when you filter by some small set of values.

Example: you have only 3 Companies and query all departments with company id as parameter – it’s probably ok. But if you have 100000 Companies and any of them can end up as parameter – it’s not a good idea then. Your application will be busy caching every query as a different one and this will worsen your performance.

Sometimes it is all about deciding what would perform better, for instance, if we fetch all Departments and have a cacheable query for that and then filter result further in the application…​ or we don’t have query caching for this query at all but do a proper filtering in a query itself. It all really depends on your data and amounts of it.

Takeaway:

Be careful using query cache and queries with parameters.

Catch #5

cache settings: expire and overfill

For each cache you can separately configure these values in ehcache.xml file:

timeToIdleSeconds="300"
timeToLiveSeconds="600"

It can also be set up via Java code, whatever works for you better. In this example those values mean that cached values will live at maximum 600 seconds after creation and they will only live 300 seconds if not accessed. By default these values are equal to 0 which is infinity.

I made some tests to demonstrate the behaviour with different expiration settings for our caches. When we run the query:

entityManager.createQuery("SELECT company from Company company where company.id < 100")
       .setHint(QueryHints.HINT_CACHEABLE, "true")
       .setHint(QueryHints.HINT_CACHE_REGION, "queryCache")
       .getResultList();

First run result:

Hibernate:
select company0_.id as id1_0_,
       company0_.address as address2_0_,
       company0_.description as descript3_0_,
       company0_.name as name4_0_
from company company0_
where company0_.id<100

Then we run it again and if in the meantime neither Entity cache nor Query cache expires, it looks just good: no queries issued to the database.

When Entity cache expires before query cache (the most dangerous situation which brings us back to the Catch #3):

Hibernate:
select company0_.id as id1_0_,
       company0_.address as address2_0_,
       company0_.description as descript3_0_,
       company0_.name as name4_0_
from company company0_
where company0_.id=?

Hibernate:
select company0_.id as id1_0_,
       company0_.address as address2_0_,
       company0_.description as descript3_0_,
       company0_.name as name4_0_
from company company0_
where company0_.id=?

Hibernate:
select company0_.id as id1_0_,
       company0_.address as address2_0_,
       company0_.description as descript3_0_,
       company0_.name as name4_0_
from company company0_
where company0_.id=?

Hibernate:
select company0_.id as id1_0_,
       company0_.address as address2_0_,
       company0_.description as descript3_0_,
       company0_.name as name4_0_
from company company0_
where company0_.id=?

Hibernate:
select company0_.id as id1_0_,
       company0_.address as address2_0_,
       company0_.description as descript3_0_,
       company0_.name as name4_0_
from company company0_
where company0_.id=?

Hibernate:
select company0_.id as id1_0_,
       company0_.address as address2_0_,
       company0_.description as descript3_0_,
       company0_.name as name4_0_
from company company0_
where company0_.id=?
...

Both expire at the same time (not more dangerous than just not having caching set up at all):

Hibernate:
select company0_.id as id1_0_,
       company0_.address as address2_0_,
       company0_.description as descript3_0_,
       company0_.name as name4_0_
from company company0_
where company0_.id<100

And just for fun, query cache expires before the entity cache (the logged query looks as expected):

Hibernate:
select company0_.id as id1_0_,
       company0_.address as address2_0_,
       company0_.description as descript3_0_,
       company0_.name as name4_0_
from company company0_
where company0_.id<100

Same for the following settings:

maxEntriesLocalHeap="10000"
maxEntriesLocalDisk="1000"

They specify the cache size or how many records it can keep. Make sure this size is properly configured, otherwise you are risking to have the same problems as discussed above.

If your cache is full, some entities/queries won’t stay cached when new ones are added while you expect them to be present in your cache. That leads to queries being issued to your DB.

If you want to have a better control on how many records for each query you want to keep or how long you want to keep them, you will need to set up more caches with desired values.

Takeaway:

<Remember, for using the query caching properly, we have to use the entity caching too. Make sure that your cached values in entity cache don’t expire before your cached query and also that they fit in there if you need them cached otherwise you end up worsening your performance (see Catch #3).

Carefully configure your caches to not bump into unexpected issues.

Conclusion

Of course there are many more things to look into when something goes wrong, for instance, there are also different CacheConcurrencyStrategies. The goal of this topic wasn’t to cover everything, but to show some real examples how the wrong configuration can worsen the performance of your application. General suggestion: if your application behaves funny, try to log the queries that are issued to the database or cache hit/miss. That may give you an idea of what is set up wrong.

Often the problem can sit in lack of understanding how ehcache really works or in lack of attention to specific settings. All the pitfalls discussed above may seem to be the funny mistakes but it’s surprising how often we make them in real projects. Hope this helps any of you to save some time on setting it up 🙂

Good luck!