Debugging deadlocks in JPA and PostgreSQL

26 May 2009

Stéphane Épardaud

by Stéphane Épardaud

In this article I describe a recent problem I had with PostgreSQL, where a deadlock caused my application to crash for unclear reasons. Naturally I also describe how to debug such problems and the reason for the deadlock.

Our application

We have a large transaction doing batch uploads which inserts rows of item_component. Every 100 entities inserted it will flush the session to the DB in order to save memory. That does not commit the transaction, it just sends the SQL commands to the database.

For each item inserted we create a separate transaction which:

  1. reads a person row,

  2. locks it with LockMode.UPGRADE to prevent concurrent modification (this is implemented with SELECT FOR UPDATE,

  3. increments a counter in one column,

  4. commits, thus dropping back the lock.

Because our batch insertion transaction does not need to lock the person table, this should work fine.

Finding the deadlock

Our code does not work: the first 100 items are created fine, the person objects are locked and incremented 100 times, but when I flush the first 100 inserted items to the DB (which works OK), the next person lock/increment hangs forever. Deadlock.

After ruling out anything that is triggered by our JPA code (disable Hibernate Search, entity listeners, caching), I have to conclude this could be a database mystery.

The first step is to find out the PostgreSQL transaction IDs of my JPA transactions. This can be done using the code shown below. Our batch insertion has TX ID 3/1501 while our blocking lock/increment TX ID is 2/4942. If we list all lock and transaction information using the query shown below (after filtering out locks on indexes, views and sequences and removing columns we do not care about) we get the following results:

# Type TX ID VTX ID Mode G Name K SQL W

TX = Transaction; VTX = Virtual TX; G = Granted; K = Kind; W = Waiting

1

tuple

2/4941

ExclusiveLock

t

person

r

select person0_.pers…​rson_id=$1 for update

t

2

relation

2/4941

RowShareLock

t

person

r

select person0_.pers…​rson_id=$1 for update

t

3

virtualxid

2/4941

ExclusiveLock

t

select person0_.pers…​rson_id=$1 for update

t

4

transactionid

774990

2/4941

ShareLock

f

select person0_.pers…​rson_id=$1 for update

t

5

relation

3/1501

RowShareLock

t

person

r

<IDLE> in transaction

f

6

relation

3/1501

RowExclusiveLock

t

item_component

r

<IDLE> in transaction

f

7

relation

3/1501

AccessShareLock

t

person

r

<IDLE> in transaction

f

8

virtualxid

3/1501

ExclusiveLock

t

<IDLE> in transaction

f

9

transactionid

774990

3/1501

ExclusiveLock

t

<IDLE> in transaction

f

Let’s first describe what each line is about:

  1. This is an exclusive lock on the person row we are locking

  2. The PostgreSQL manual describes this lock as being taken for FOR UPDATE

  3. The PostgreSQL manual explains that Every transaction holds an exclusive lock on its virtual transaction ID for its entire duration.

  4. The PostgreSQL manual explains that When one transaction finds it necessary to wait specifically for another transaction, it does so by attempting to acquire share lock on the other transaction ID. So here we have the lock/increment 2/4941 transaction waiting for our batch insertion transaction 3/1501 to complete.

  5. Why this transaction holds this lock is beyond me. The PostgreSQL manual explains that this lock is obtained for SELECT FOR UPDATE|SHARE which is never called by this transaction, so go figure…​

  6. This is a normal lock obtained automatically when inserting an item in a table. The table here is item_component.

  7. This is a normal lock obtained automatically when selecting an item from a table. This lock on person cannot harm and is entirely normal.

  8. The PostgreSQL manual explains that Every transaction holds an exclusive lock on its virtual transaction ID for its entire duration.

  9. The PostgreSQL manual explains If a permanent ID is assigned to the transaction (which normally happens only if the transaction changes the state of the database), it also holds an exclusive lock on its permanent transaction ID until it ends

The diagnosis

My lock/increment transaction is waiting for the batch insertion transaction to release its lock on the person row it is trying to lock. We now have to find out why it acquired this lock since I did not do it manually.

PostgreSQL implicit locks

After trying it myself in a PostgreSQL console, the following SQL causes a ROW SHARE lock to be obtained on the person row referenced by foreign key 2:

begin;
insert into item_component (item_component_id, creator_id) values (nextval('hibernate_sequence'), 2);

Of course this ROW SHARE lock is held until my batch insert transaction is committed, and prevents my lock/increment transactions from doing their job.

After a friendly chat on the PostgreSQL IRC channel I was able to get a confirmation that this behaviour is normal and a result of the RI (Referential Integrity) triggers. I have sent a mail to the documentation mailing list asking for better documentation of such implicit side-effect locks that will help others like me not waste days finding this out.

Now I need to find a better strategy for my code.

Appendix

[DebuggingdeadlocksinJPAandPostgreSQL-tid]#

=== Display Virtual Transaction ID in JPA

Use this code to find out your PostgreSQL virtual transaction ID:

 public Object getTXID() {
  Query query =
   entityManager.createNativeQuery("select virtualtransaction from pg_locks where pid = pg_backend_pid()");
  return query.getSingleResult();
 }

[DebuggingdeadlocksinJPAandPostgreSQL-query]#

Listing all locks and their transactions

Use this SQL to list all locks and their transactions:

select
 locktype, database, relation, page, tuple, transactionid, virtualtransaction, mode, granted,
 relname, relkind,
 datname, usename,
 case
  when length(current_query) > 40
    then substring(current_query for 20) || '...'
         || substring(current_query from (length(current_query) - 19) for 20)
  else current_query
 end as query,
 waiting, query_start, client_addr, client_port
from pg_locks
 left outer join pg_class on oid = relation
 left outer join pg_stat_activity on pid = procpid
order by pid;