Search This Blog

Tuesday 30 October 2012

The wrong result when using fetch and DISTINCT keywords

Consider a query to fetch all entities who have a child with id 1.
public static void testQuerySingleE() {
    final Session session = sessionFactory.openSession();  
    Transaction transaction = session.beginTransaction();
    System.out.println("Testing the loading");
    Query q = session.createQuery("select e from Entity e" + 
                           " join fetch e.children c where c.id = 1"); 
    @SuppressWarnings("unchecked")
    List<Entity> entities =  q.list();
    for (Entity entity : entities ) {
        System.out.println("entity is " + entity + " and no of kids are " 
                   + entity.getChildren().size());
    }
    transaction.commit();
    session.close();
}
The output is as below:
/* select
        e 
    from
        Entity e 
    join
        fetch e.children c 
    where
        c.id = 1 */ 
    select
            entity0_.ID as ID0_0_,
            children1_.ID as ID2_1_,
            entity0_.NAME as NAME0_0_,
            entity0_.DATE as DATE0_0_,
            entity0_.MASTER_ID as MASTER4_0_0_,
            children1_.`KEY` as KEY2_2_1_,
            children1_.ENTITY_ID as ENTITY3_2_1_,
            children1_.ENTITY_ID as ENTITY3_0__,
            children1_.ID as ID0__ 
        from
            ENTITY entity0_ 
        inner join
            CHILD_ENTITY children1_ 
                on entity0_.ID=children1_.ENTITY_ID 
        where
            children1_.ID=1
entity is [Entity] : ( id 1 , data : entity1 , master.Id : 1 , date : null )] an
d no of kids are 1
The join query returned a set of 2 entity-child record and the where clause trimmed it to 1. Everything is fine.
Or is it ?
The size of the set is indicated as 1. This is not the case. If I remove the where condition then the console logs would be :
Testing the loading
Hibernate: 
    /* select
        e 
    from
        Entity e 
    join
        fetch e.children c */ 
        select
            entity0_.ID as ID0_0_,
            children1_.ID as ID2_1_,
            entity0_.NAME as NAME0_0_,
            entity0_.DATE as DATE0_0_,
            entity0_.MASTER_ID as MASTER4_0_0_,
            children1_.`KEY` as KEY2_2_1_,
            children1_.ENTITY_ID as ENTITY3_2_1_,
            children1_.ENTITY_ID as ENTITY3_0__,
            children1_.ID as ID0__ 
        from
            ENTITY entity0_ 
        inner join
            CHILD_ENTITY children1_ 
                on entity0_.ID=children1_.ENTITY_ID
entity is [Entity] : ( id 1 , data : entity1 , master.Id : 1 , date : null )] an
d no of kids are 2
entity is [Entity] : ( id 1 , data : entity1 , master.Id : 1 , date : null )] an
d no of kids are 2
The problem is that we have used the fetch attribute.
  1. This means an eager load for association. 
  2. This results in Hibernate using the association columns to fill up the set. So the data from the two Records was used to create the two children for Entity1.
  3. But with the where clause we applied the condition on our children table, this reduced the size of the result set and hence the children set has fewer than correct elements.
Thus if you are eager fetching a collection, do not apply any restriction on the same.
The second query while revealing the correct size of children faced another problem. The Entity object appears twice in the list. This is repeat data and unnecessary.
We need the entity to be returned once only. For this we use the DISTINCT keyword.
The changed code is
public static void testQuerySingleE() {
    final Session session = sessionFactory.openSession();
    Transaction transaction = session.beginTransaction();
    System.out.println("Testing the loading");
    Query q = session.createQuery("select distinct(e) from " 
              + "Entity e join fetch e.children c");
              // where c.id = 1"); 
    List<Entity> entities =  q.list();
    for (Entity entity : entities ) {
        System.out.println("entity is " + entity 
            + " and no of kids are " + entity.getChildren().size());
    }
    transaction.commit();
    session.close();
}
and the logs are :
Testing the loading
Hibernate: 
    /* select
        distinct(e) 
    from
        Entity e 
    join
        fetch e.children c */ 
        select
            distinct entity0_.ID as ID0_0_,
            children1_.ID as ID2_1_,
            entity0_.NAME as NAME0_0_,
            entity0_.DATE as DATE0_0_,
            entity0_.MASTER_ID as MASTER4_0_0_,
            children1_.`KEY` as KEY2_2_1_,
            children1_.ENTITY_ID as ENTITY3_2_1_,
            children1_.ENTITY_ID as ENTITY3_0__,
            children1_.ID as ID0__ 
        from
            ENTITY entity0_ 
        inner join
            CHILD_ENTITY children1_ 
                on entity0_.ID=children1_.ENTITY_ID
entity is [Entity] : ( id 1 , data : entity1 , master.Id : 1 , date : null )] an
d no of kids are 2
The distinct keyword is visible in the SQL logs.
I executed the above query direct on the MySQL workbench. The result is :
 As can be seen the result set has the same Entity details occurring twice, in fact you would get the same result if you removed the distinct command from the SQL.
The distinct keyword does not affect the SQL query in any manner. It is actually used by hibernate. The presence of distinct keyword ensures that Hibernates does the filtering in memory and removes all duplicates. So our result has only one entity record now.

No comments:

Post a Comment