
igh-volume database
traffic is a frequent cause of performance problems in Web applications.
Hibernate is a high-performance, object/relational persistence and
query service, but it won't solve all your performance issues without a
little help. In many cases, second-level caching can be just what
Hibernate needs to realize its full performance-handling potential. This
article examines Hibernate's caching functionalities and shows how you
can use them to significantly boost application performance.
An Introduction to Caching
Caching is widely used for optimizing database applications. A cache is
designed to reduce traffic between your application and the database by
conserving data already loaded from the database. Database access is
necessary only when retrieving data that is not currently available in
the cache. The application may need to empty (invalidate) the cache from
time to time if the database is updated or modified in some way,
because it has no way of knowing whether the cache is up to date.
Hibernate Caching
Hibernate uses two different caches for objects: first-level cache and
second-level cache. First-level cache is associated with the Session
object, while second-level cache is associated with the Session Factory
object. By default, Hibernate uses first-level cache on a
per-transaction basis. Hibernate uses this cache mainly to reduce the
number of SQL queries it needs to generate within a given transaction.
For example, if an object is modified several times within the same
transaction, Hibernate will generate only one SQL UPDATE statement at
the end of the transaction, containing all the modifications.
This article focuses on second-level cache. To reduce database traffic,
second-level cache keeps loaded objects at the Session Factory level
between transactions. These objects are available to the whole
application, not just to the user running the query. This way, each time
a query returns an object that is already loaded in the cache, one or
more database transactions potentially are avoided.
In addition, you can use a query-level cache if you need to cache actual query results, rather than just persistent objects.
Cache Implementations
Caches are complicated pieces of
software,
and the market offers quite a number of choices, both open source and
commercial. Hibernate supports the following open-source cache
implementations out-of-the-box:
- EHCache (org.hibernate.cache.EhCacheProvider)
- OSCache (org.hibernate.cache.OSCacheProvider)
- SwarmCache (org.hibernate.cache.SwarmCacheProvider)
- JBoss TreeCache (org.hibernate.cache.TreeCacheProvider)
Each cache provides different capacities in terms of performance, memory use, and configuration possibilities:
- EHCache
is a fast, lightweight, and easy-to-use in-process cache. It supports
read-only and read/write caching, and memory- and disk-based caching.
However, it does not support clustering.
- OSCache
is another open-source caching solution. It is part of a larger
package, which also provides caching functionalities for JSP pages or
arbitrary objects. It is a powerful and flexible
package, which, like EHCache, supports read-only and read/write
caching, and memory- and disk-based caching. It also provides basic
support for clustering via either JavaGroups or JMS.
- SwarmCache
is a simple cluster-based caching solution based on JavaGroups. It
supports read-only or nonstrict read/write caching (the next section
explains this term). This type of cache is appropriate for applications
that typically have many more read operations than write operations.
- JBoss TreeCache
is a powerful replicated (synchronous or asynchronous) and
transactional cache. Use this solution if you really need a true
transaction-capable caching architecture.
Another cache implementation worth mentioning is the commercial
Tangosol Coherence cache.
Caching Strategies
Once you have chosen your cache implementation, you need to specify your
access strategies. The following four caching strategies are available:
- Read-only: This strategy is useful for data that is read frequently
but never updated. This is by far the simplest and best-performing cache
strategy.
- Read/write: Read/write caches may be appropriate if your data needs
to be updated. They carry more overhead than read-only caches. In
non-JTA environments, each transaction should be completed when
Session.close() or Session.disconnect() is called.
- Nonstrict read/write: This strategy does not guarantee that two
transactions won't simultaneously modify the same data. Therefore, it
may be most appropriate for data that is read often but only
occasionally modified.
- Transactional: This is a fully transactional cache that may be used only in a JTA environment.
Support for these strategies is not identical for every cache
implementation. Table 1 shows the options available for the different
cache implementations.
Cache |
Read-only |
Nonstrict Read/write |
Read/write |
Transactional |
EHCache |
Yes |
Yes |
Yes |
No |
OSCache |
Yes |
Yes |
Yes |
No |
SwarmCache |
Yes |
Yes |
No |
No |
JBoss TreeCache |
Yes |
No |
No |
Yes |
Table 1. Supported Caching Strategies for Hibernate Out-of-the-Box Cache Implementations |
The remainder of the article demonstrates single-JVM caching using EHCache.
Cache Configuration
To activate second-level caching, you need to define the
hibernate.cache.provider_class property in the hibernate.cfg.xml file as
follows:
<hibernate-configuration>
<session-factory>
...
<property name="hibernate.cache.provider_class">
org.hibernate.cache.EHCacheProvider
</property>
...
</session-factory>
</hibernate-configuration>
For testing purposes in Hibernate 3, you may also want to use the
hibernate.cache.use_second_level_cache property, which allows you to
activate (and deactivate) the second-level cache. By default, the
second-level cache is activated and uses the EHCache provider.
A Practical Application
 | |
Figure 1. The Employee UML Class Diagram |
The sample demo application for this article contains four simple
tables: a list of countries, a list of airports, a list of employees,
and a list of spoken languages. Each employee is assigned a country, and
can speak many languages. Each country can have any number of airports.
Figure 1 shows the UML class diagram for the application, and
Figure 2 shows the database schema. The
sample application source code contains the following SQL scripts, which you need in order to create and instantiate the corresponding database:
- src/sql/create.sql: The SQL script used to create the database.
- src/sql/init.sql: Test data
Note on Installing Maven 2
At the time of writing, the Maven 2 repository seemed to be missing some
jars. To get around this problem, find the missing jars in the root
directory of the application source code. To install them in the Maven 2
repository, go to the app directory and execute the following
instructions:
$ mvn install:install-file -DgroupId=javax.security -DartifactId=jacc -Dversion=1.0
-Dpackaging=jar -Dfile=jacc-1.0.jar
$ mvn install:install-file -DgroupId=javax.transaction -DartifactId=jta -Dversion=1.0.1B
-Dpackaging=jar -Dfile=jta-1.0.1B.jar
Setting Up a Read-Only Cache
To begin with something simple, here's the Hibernate mapping for the Country class:
<hibernate-mapping package="com.wakaleo.articles.caching.businessobjects">
<class name="Country" table="COUNTRY" dynamic-update="true">
<meta attribute="implement-equals">true</meta>
<cache usage="read-only"/>
<id name="id" type="long" unsaved-value="null" >
<column name="cn_id" not-null="true"/>
<generator class="increment"/>
</id>
<property column="cn_code" name="code" type="string"/>
<property column="cn_name" name="name" type="string"/>
<set name="airports">
<key column="cn_id"/>
<one-to-many class="Airport"/>
</set>
</class>
</hibernate-mapping>
Suppose you need to display a list of all countries. You could implement
this with a simple method in the CountryDAO class as follows:
public class CountryDAO {
...
public List getCountries() {
return SessionManager.currentSession()
.createQuery(
"from Country as c order by c.name")
.list();
}
}
Because this method is likely to be called often, you need to see how it
behaves under pressure. So write a simple unit test that simulates five
successive calls:
public void testGetCountries() {
CountryDAO dao = new CountryDAO();
for(int i = 1; i <= 5; i++) {
Transaction tx = SessionManager.getSession().beginTransaction();
TestTimer timer = new TestTimer("testGetCountries");
List countries = dao.getCountries();
tx.commit();
SessionManager.closeSession();
timer.done();
assertNotNull(countries);
assertEquals(countries.size(),229);
}
}
You can run this test from either your preferred IDE or the command line
using Maven 2 (the demo application provides the Maven 2 project
files). The demo application was tested using a local MySQL database.
When you run this test, you should get something like the following:
$mvn test -Dtest=CountryDAOTest
...
testGetCountries: 521 ms.
testGetCountries: 357 ms.
testGetCountries: 249 ms.
testGetCountries: 257 ms.
testGetCountries: 355 ms.
[surefire] Running com.wakaleo.articles.caching.dao.CountryDAOTest
[surefire] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3,504 sec
So each call takes roughly a quarter of a second, which is a bit
sluggish by most standards. The list of countries probably doesn't
change very often, so this class would be a good candidate for a
read-only cache. So add one.
You can activate second-level caching classes in one of the two following ways:
- You activate it on a class-by-class basis in the *.hbm.xml file, using the cache attribute:
<hibernate-mapping package="com.wakaleo.articles.caching.businessobjects">
<class name="Country" table="COUNTRY" dynamic-update="true">
<meta attribute="implement-equals">true</meta>
<cache usage="read-only"/>
...
</class>
</hibernate-mapping>
- You can store all cache information in the hibernate.cfg.xml file, using the class-cache attribute:
<hibernate-configuration>
<session-factory>
...
<property name="hibernate.cache.provider_class">
org.hibernate.cache.EHCacheProvider
</property>
...
<class-cache
class="com.wakaleo.articles.caching.businessobjects.Country"
usage="read-only"
/>
</session-factory>
</hibernate-configuration>
Next, you need to configure the cache rules for this class. These rules
determine the nitty-gritty details of how the cache will behave. The
examples in this demo use EHCache, but remember that each cache
implementation is different.
EHCache needs a configuration file (generally called ehcache.xml) at the
classpath root. The EHCache configuration file is well documented on
the
project Web site.
Basically, you define rules for each class you want to store, as well
as a defaultCache entry for use when you don't explicitly give any rules
for a class.
For the first example, you can use the following simple EHCache configuration file:
<ehcache>
<diskStore path="java.io.tmpdir"/>
<defaultCache
maxElementsInMemory="10000"
eternal="false"
timeToIdleSeconds="120"
timeToLiveSeconds="120"
overflowToDisk="true"
diskPersistent="false"
diskExpiryThreadIntervalSeconds="120"
memoryStoreEvictionPolicy="LRU"
/>
<cache name="com.wakaleo.articles.caching.businessobjects.Country"
maxElementsInMemory="300"
eternal="true"
overflowToDisk="false"
/>
</ehcache>
This file basically sets up a memory-based cache for Countries with at
most 300 elements (the country list contains 229 countries). Note that
the cache never expires (the 'eternal=true' property).
Now, rerun the tests to see how the cache performs:
$mvn test -Dtest=CompanyDAOTest
...
testGetCountries: 412 ms.
testGetCountries: 98 ms.
testGetCountries: 92 ms.
testGetCountries: 82 ms.
testGetCountries: 93 ms.
[surefire] Running com.wakaleo.articles.caching.dao.CountryDAOTest
[surefire] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2,823 sec
As you would expect, the first query is unchanged since the first time
around you have to actually load the data. However, all subsequent
queries are several times faster.
Behind the Scenes
Before moving on, it is useful to look at what's going on behind the
scenes. One thing you should know is that the Hibernate cache does not
store object instances. Instead, it stores objects in their "dehydrated"
form (to use Hibernate terminology), that is, as a set of property
values. The following is a sample of the contents of the Country cache:
{
30 => [bw,Botswana,30],
214 => [uy,Uruguay,214],
158 => [pa,Panama,158],
31 => [by,Belarus,31]
95 => [in,India,95]
...
}
Notice how each ID is mapped to an array of property values. You may
also have noticed that only the primitive properties are stored; there
is no sign of the airports property. This is because the airports
property is actually an association: a set of references to other
persistent objects.
By default, Hibernate does not cache associations. It's up to you to
decide which associations should be cached, and which associations
should be reloaded whenever the cached object is retrieved from the
second-level cache.
Association caching is a very powerful functionality. The next section takes a more detailed look at it.