Hibernate EJB Tutorial

hibernate interview questions and hibernate vs ejb 3.0 persistence and ejb tutorial eclipse
GregDeamons Profile Pic
GregDeamons,New Zealand,Professional
Published Date:03-08-2017
Your Website URL(Optional)
Comment
Part 1 Getting started with Hibernate and EJB 3.0 In part 1, we show you why object persistence is such a complex topic and what solutions you can apply in practice. Chapter 1 introduces the object/relational paradigm mismatch and several strategies to deal with it, foremost object/rela- tional mapping (ORM). In chapter 2, we guide you step by step through a tutorial with Hibernate, Java Persistence, and EJB 3.0—you’ll implement and test a “Hello World” example in all variations. Thus prepared, in chapter 3 you’re ready to learn how to design and implement complex business domain models in Java, and which mapping metadata options you have available. After reading this part of the book, you’ll understand why you need object/ relational mapping, and how Hibernate, Java Persistence, and EJB 3.0 work in practice. You’ll have written your first small project, and you’ll be ready to take on more complex problems. You’ll also understand how real-world business entities can be implemented as a Java domain model, and in what format you prefer to work with object/relational mapping metadata. Understanding object/relational persistence This chapter covers ■ Object persistence with SQL databases ■ The object/relational paradigm mismatch ■ Persistence layers in object-oriented applications ■ Object/relational mapping background 34 CHAPTER 1 Understanding object/relational persistence The approach to managing persistent data has been a key design decision in every software project we’ve worked on. Given that persistent data isn’t a new or unusual requirement for Java applications, you’d expect to be able to make a simple choice among similar, well-established persistence solutions. Think of GUI component frame- web application frameworks (Struts versus WebWork), works (Swing versus SWT), or template engines (JSP versus Velocity). Each of the competing solutions has various advantages and disadvantages, but they all share the same scope and overall approach. Unfortunately, this isn’t yet the case with persistence technologies, where we see some wildly differing solutions to the same problem. For several years, persistence has been a hot topic of debate in the Java commu- nity. Many developers don’t even agree on the scope of the problem. Is persistence a problem that is already solved by relational technology and extensions such as stored procedures, or is it a more pervasive problem that must be addressed by spe- EJB entity beans? Should we hand-code even cial Java component models, such as the most primitive CRUD (create, read, update, delete) operations in SQL and JDBC, or should this work be automated? How do we achieve portability if every database management system has its own SQL dialect? Should we abandon SQL completely and adopt a different database technology, such as object database sys- tems? Debate continues, but a solution called object/relational mapping (ORM) now has wide acceptance. Hibernate is an open source ORM service implementation. Hibernate is an ambitious project that aims to be a complete solution to the problem of managing persistent data in Java. It mediates the application’s interac- tion with a relational database, leaving the developer free to concentrate on the business problem at hand. Hibernate is a nonintrusive solution. You aren’t required to follow many Hibernate-specific rules and design patterns when writing your business logic and persistent classes; thus, Hibernate integrates smoothly with most new and existing applications and doesn’t require disruptive changes to the rest of the application. This book is about Hibernate. We’ll cover basic and advanced features and describe some ways to develop new applications using Hibernate. Often, these recommendations won’t even be specific to Hibernate. Sometimes they will be our ideas about the best ways to do things when working with persistent data, explained in the context of Hibernate. This book is also about Java Persistence, a new standard for persistence that is part of the also updated EJB 3.0 specification. Hibernate implements Java Persistence and supports all the standardized map- pings, queries, and APIs. Before we can get started with Hibernate, however, you need to understand the core problems of object persistence and object/relationalWhat is persistence? 5 mapping. This chapter explains why tools like Hibernate and specifications such as Java Persistence and EJB 3.0 are needed. First, we define persistent data management in the context of object-oriented applications and discuss the relationship of SQL, JDBC, and Java, the underlying technologies and standards that Hibernate is built on. We then discuss the so- called object/relational paradigm mismatch and the generic problems we encounter in object-oriented software development with relational databases. These prob- lems make it clear that we need tools and patterns to minimize the time we have to spend on the persistence-related code of our applications. After we look at alternative tools and persistence mechanisms, you’ll see that ORM is the best avail- able solution for many scenarios. Our discussion of the advantages and drawbacks of ORM will give you the full background to make the best decision when picking a persistence solution for your own project. We also take a look at the various Hibernate software modules, and how you can combine them to either work with Hibernate only, or with Java Persistence and EJB 3.0-compliant features. The best way to learn Hibernate isn’t necessarily linear. We understand that you may want to try Hibernate right away. If this is how you’d like to proceed, skip to the second chapter of this book and have a look at the “Hello World” example and set up a project. We recommend that you return here at some point as you circle through the book. That way, you’ll be prepared and have all the back- ground concepts you need for the rest of the material. 1.1 What is persistence? Almost all applications require persistent data. Persistence is one of the funda- mental concepts in application development. If an information system didn’t preserve data when it was powered off, the system would be of little practical use. When we talk about persistence in Java, we’re normally talking about storing data in a relational database using SQL. We’ll start by taking a brief look at the technology and how we use it with Java. Armed with that information, we’ll then continue our discussion of persistence and how it’s implemented in object-ori- ented applications. 1.1.1 Relational databases You, like most other developers, have probably worked with a relational database. Most of us use a relational database every day. Relational technology is a known quantity, and this alone is sufficient reason for many organizations to choose it.6 CHAPTER 1 Understanding object/relational persistence But to say only this is to pay less respect than is due. Relational databases are entrenched because they’re an incredibly flexible and robust approach to data management. Due to the complete and consistent theoretical foundation of the relational data model, relational databases can effectively guarantee and protect the integrity of the data, among other desirable characteristics. Some people would even say that the last big invention in computing has been the relational concept for data management as first introduced by E.F. Codd (Codd, 1970) more than three decades ago. Relational database management systems aren’t specific to Java, nor is a rela- tional database specific to a particular application. This important principle is known as data independence. In other words, and we can’t stress this important fact enough, data lives longer than any application does. Relational technology provides a way of sharing data among different applications, or among different technolo- gies that form parts of the same application (the transactional engine and the reporting engine, for example). Relational technology is a common denominator of many disparate systems and technology platforms. Hence, the relational data model is often the common enterprise-wide representation of business entities. Relational database management systems have SQL-based application program- ming interfaces; hence, we call today’s relational database products SQL database management systems or, when we’re talking about particular systems, SQL databases. Before we go into more detail about the practical aspects of SQL databases, we have to mention an important issue: Although marketed as relational, a database system providing only an SQL data language interface isn’t really relational and in many ways isn’t even close to the original concept. Naturally, this has led to confu- sion. SQL practitioners blame the relational data model for shortcomings in the SQL language, and relational data management experts blame the SQL standard for being a weak implementation of the relational model and ideals. Application developers are stuck somewhere in the middle, with the burden to deliver some- thing that works. We’ll highlight some important and significant aspects of this issue throughout the book, but generally we’ll focus on the practical aspects. If you’re interested in more background material, we highly recommend Practical Issues in Database Management: A Reference for the Thinking Practitioner by Fabian Pas- cal (Pascal, 2000). 1.1.2 Understanding SQL To use Hibernate effectively, a solid understanding of the relational model and SQL is a prerequisite. You need to understand the relational model and topics such as normalization to guarantee the integrity of your data, and you’ll need toWhat is persistence? 7 use your knowledge of SQL to tune the performance of your Hibernate applica- tion. Hibernate automates many repetitive coding tasks, but your knowledge of persistence technology must extend beyond Hibernate itself if you want to take advantage of the full power of modern SQL databases. Remember that the under- lying goal is robust, efficient management of persistent data. Let’s review some of the SQL terms used in this book. You use SQL as a data def- inition language (DDL) to create a database schema with CREATE and ALTER state- ments. After creating tables (and indexes, sequences, and so on), you use SQL as a data manipulation language (DML) to manipulate and retrieve data. The manipula- tion operations include insertions, updates, and deletions. You retrieve data by exe- cuting queries with restrictions, projections, and join operations (including the Cartesian product). For efficient reporting, you use SQL to group, order, and aggregate data as necessary. You can even nest SQL statements inside each other; this tech- nique uses subselects. You’ve probably used SQL for many years and are familiar with the basic opera- tions and statements written in this language. Still, we know from our own experi- ence that SQL is sometimes hard to remember, and some terms vary in usage. To understand this book, we must use the same terms and concepts, so we advise you to read appendix A if any of the terms we’ve mentioned are new or unclear. If you need more details, especially about any performance aspects and how SQL is executed, get a copy of the excellent book SQL Tuning by Dan Tow (Tow, 2003). Also read An Introduction to Database Systems by Chris Date (Date, 2003) for the theory, concepts, and ideals of (relational) database systems. The latter book is an excellent reference (it’s big) for all questions you may possibly have about databases and data management. Although the relational database is one part of ORM, the other part, of course, consists of the objects in your Java application that need to be persisted to and loaded from the database using SQL. 1.1.3 Using SQL in Java When you work with an SQL database in a Java application, the Java code issues SQL statements to the database via the Java Database Connectivity (JDBC) API. Whether the SQL was written by hand and embedded in the Java code, or gener- ated on the fly by Java code, you use the JDBC API to bind arguments to prepare query parameters, execute the query, scroll through the query result table, retrieve values from the result set, and so on. These are low-level data access tasks; as application developers, we’re more interested in the business problem that requires this data access. What we’d really like to write is code that saves and8 CHAPTER 1 Understanding object/relational persistence retrieves objects—the instances of our classes—to and from the database, reliev- ing us of this low-level drudgery. Because the data access tasks are often so tedious, we have to ask: Are the rela- tional data model and (especially) SQL the right choices for persistence in object- oriented applications? We answer this question immediately: Yes There are many reasons why SQL databases dominate the computing industry—relational data- base management systems are the only proven data management technology, and they’re almost always a requirement in any Java project. However, for the last 15 years, developers have spoken of a paradigm mismatch. This mismatch explains why so much effort is expended on persistence-related concerns in every enterprise project. The paradigms referred to are object model- ing and relational modeling, or perhaps object-oriented programming and SQL. Let’s begin our exploration of the mismatch problem by asking what persistence means in the context of object-oriented application development. First we’ll widen the simplistic definition of persistence stated at the beginning of this sec- tion to a broader, more mature understanding of what is involved in maintaining and using persistent data. 1.1.4 Persistence in object-oriented applications In an object-oriented application, persistence allows an object to outlive the pro- cess that created it. The state of the object can be stored to disk, and an object with the same state can be re-created at some point in the future. This isn’t limited to single objects—entire networks of interconnected objects can be made persistent and later re-created in a new process. Most objects aren’t persistent; a transient object has a limited lifetime that is bounded by the life of the process that instantiated it. Almost all Java applications contain a mix of per- sistent and transient objects; hence, we need a subsystem that manages our per- sistent data. Modern relational databases provide a structured representation of persistent data, enabling the manipulating, sorting, searching, and aggregating of data. Database management systems are responsible for managing concurrency and data integrity; they’re responsible for sharing data between multiple users and multiple applications. They guarantee the integrity of the data through integrity rules that have been implemented with constraints. A database management sys- tem provides data-level security. When we discuss persistence in this book, we’re thinking of all these things: What is persistence? 9 ■ Storage, organization, and retrieval of structured data ■ Concurrency and data integrity ■ Data sharing And, in particular, we’re thinking of these problems in the context of an object- oriented application that uses a domain model. An application with a domain model doesn’t work directly with the tabular rep- resentation of the business entities; the application has its own object-oriented model of the business entities. If the database of an online auction system has ITEM and BID tables, for example, the Java application defines Item and Bid classes. Then, instead of directly working with the rows and columns of an SQL result set, the business logic interacts with this object-oriented domain model and its runtime realization as a network of interconnected objects. Each instance of a Bid has a reference to an auction Item, and each Item may have a collection of refer- ences to Bid instances. The business logic isn’t executed in the database (as an SQL stored procedure); it’s implemented in Java in the application tier. This allows business logic to make use of sophisticated object-oriented concepts such as inheritance and polymorphism. For example, we could use well-known design patterns such as Strategy, Mediator, and Composite (Gamma and others, 1995), all of which depend on polymorphic method calls. Now a caveat: Not all Java applications are designed this way, nor should they be. Simple applications may be much better off without a domain model. Com- plex applications may have to reuse existing stored procedures. SQL and the JDBC API are perfectly serviceable for dealing with pure tabular data, and the JDBC RowSet makes CRUD operations even easier. Working with a tabular representation of persistent data is straightforward and well understood. However, in the case of applications with nontrivial business logic, the domain model approach helps to improve code reuse and maintainability significantly. In practice, both strategies are common and needed. Many applications need to exe- cute procedures that modify large sets of data, close to the data. At the same time, other application modules could benefit from an object-oriented domain model that executes regular online transaction processing logic in the application tier. An efficient way to bring persistent data closer to the application code is required. If we consider SQL and relational databases again, we finally observe the mis- match between the two paradigms. SQL operations such as projection and join always result in a tabular representation of the resulting data. (This is known as10 CHAPTER 1 Understanding object/relational persistence transitive closure; the result of an operation on relations is always a relation.) This is quite different from the network of interconnected objects used to execute the business logic in a Java application. These are fundamentally different models, not just different ways of visualizing the same model. With this realization, you can begin to see the problems—some well understood and some less well understood—that must be solved by an application that com- bines both data representations: an object-oriented domain model and a persistent relational model. Let’s take a closer look at this so-called paradigm mismatch. 1.2 The paradigm mismatch The object/relational paradigm mismatch can be broken into several parts, which we’ll examine one at a time. Let’s start our exploration with a simple example that is problem free. As we build on it, you’ll begin to see the mismatch appear. Suppose you have to design and implement an online e-commerce applica- tion. In this application, you need a class to represent information about a user of the system, and another class to represent information about the user’s billing details, as shown in figure 1.1. In this diagram, you can see that a User has many BillingDetails. You can navigate the relationship between the classes in both directions. The classes repre- senting these entities may be extremely simple: public class User private String username; private String name; private String address; private Set billingDetails; // Accessor methods (getter/setter), business methods, etc. ... public class BillingDetails private String accountNumber; private String accountName; private String accountType; private User user; // Accessor methods (getter/setter), business methods, etc. ... Figure 1.1 A simple UML class diagram of the User and BillingDetails entitiesThe paradigm mismatch 11 Note that we’re only interested in the state of the entities with regard to persis- tence, so we’ve omitted the implementation of property accessors and business methods (such as getUsername() or billAuction()). It’s easy to come up with a good SQL schema design for this case: create table USERS ( USERNAME varchar(15) not null primary key, NAME varchar(50) not null, ADDRESS varchar(100) ) create table BILLING_DETAILS ( ACCOUNT_NUMBER varchar(10) not null primary key, ACCOUNT_NAME varchar(50) not null, ACCOUNT_TYPE varchar(2) not null, USERNAME varchar(15) foreign key references user ) The relationship between the two entities is represented as the foreign key, USERNAME, in BILLING_DETAILS. For this simple domain model, the object/rela- JDBC code to tional mismatch is barely in evidence; it’s straightforward to write insert, update, and delete information about users and billing details. Now, let’s see what happens when we consider something a little more realistic. The paradigm mismatch will be visible when we add more entities and entity rela- tionships to our application. The most glaringly obvious problem with our current implementation is that String value. In most systems, it’s neces- we’ve designed an address as a simple sary to store street, city, state, country, and ZIP code information separately. Of User class, but because it’s course, we could add these properties directly to the highly likely that other classes in the system will also carry address information, it Address class. The updated model is makes more sense to create a separate shown in figure 1.2. ADDRESS table? Not necessarily. It’s common to keep Should we also add an address information in the USERS table, in individual columns. This design is likely to perform better, because a table join isn’t needed if you want to retrieve the user and address in a single query. The nicest solution may even be to create a SQL datatype to represent addresses, and to use a single column of user-defined that new type in the USERS table instead of several new columns. Basically, we have the choice of adding either several columns or a single col- umn (of a new SQL datatype). This is clearly a problem of granularity. Figure 1.2 The User has an Address12 CHAPTER 1 Understanding object/relational persistence 1.2.1 The problem of granularity Granularity refers to the relative size of the types you’re working with. Let’s return to our example. Adding a new datatype to our database catalog, Address Java instances in a single column, sounds like the best to store approach. A new Address type (class) in Java and a new ADDRESS SQL datatype should guarantee interoperability. However, you’ll find various problems if you check the support for user-defined datatypes (UDT) in today’s SQL database management systems. UDT support is one of a number of so-called object-relational extensions to tradi- SQL. This term alone is confusing, because it means that the database man- tional agement system has (or is supposed to support) a sophisticated datatype system— something you take for granted if somebody sells you a system that can handle data in a relational fashion. Unfortunately, UDT support is a somewhat obscure SQL database management systems and certainly isn’t portable feature of most between different systems. Furthermore, the SQL standard supports user-defined datatypes, but poorly. This limitation isn’t the fault of the relational data model. You can consider the failure to standardize such an important piece of functionality as fallout from the object-relational database wars between vendors in the mid-1990s. Today, most SQL products have limited type systems—no questions developers accept that asked. However, even with a sophisticated UDT system in our SQL database man- agement system, we would likely still duplicate the type declarations, writing the new type in Java and again in SQL. Attempts to find a solution for the Java space, SQLJ, unfortunately, have not had much success. such as For these and whatever other reasons, use of UDTs or Java types inside an SQL database isn’t common practice in the industry at this time, and it’s unlikely that you’ll encounter a legacy schema that makes extensive use of UDTs. We therefore Address class in a single new column can’t and won’t store instances of our new that has the same datatype as the Java layer. Our pragmatic solution for this problem has several columns of built-in ven- dor-defined SQL types (such as boolean, numeric, and string datatypes). The USERS table is usually defined as follows: create table USERS ( USERNAME varchar(15) not null primary key, NAME varchar(50) not null, ADDRESS_STREET varchar(50), ADDRESS_CITY varchar(15), ADDRESS_STATE varchar(15),The paradigm mismatch 13 ADDRESS_ZIPCODE varchar(5), ADDRESS_COUNTRY varchar(15) ) Classes in our domain model come in a range of different levels of granularity— from coarse-grained entity classes like User, to finer-grained classes like Address, down to simple String-valued properties such as zipcode. In contrast, just two levels of granularity are visible at the level of the SQL database: tables such as USERS, and columns such as ADDRESS_ZIPCODE. Many simple persistence mechanisms fail to recognize this mismatch and so end up forcing the less flexible SQL representation upon the object model. We’ve seen countless User classes with properties named zipcode It turns out that the granularity problem isn’t especially difficult to solve. We probably wouldn’t even discuss it, were it not for the fact that it’s visible in so many existing systems. We describe the solution to this problem in chapter 4, sec- tion 4.4, “Fine-grained models and mappings.” A much more difficult and interesting problem arises when we consider domain models that rely on inheritance, a feature of object-oriented design we may use to bill the users of our e-commerce application in new and interesting ways. 1.2.2 The problem of subtypes In Java, you implement type inheritance using superclasses and subclasses. To illustrate why this can present a mismatch problem, let’s add to our e-commerce application so that we now can accept not only bank account billing, but also credit and debit cards. The most natural way to reflect this change in the model is to use inheritance for the BillingDetails class. We may have an abstract BillingDetails superclass, along with several con- crete subclasses: CreditCard, BankAccount, and so on. Each of these subclasses defines slightly different data (and completely different functionality that acts on that data). The UML class diagram in figure 1.3 illustrates this model. SQL should probably include standard support for supertables and subtables. This would effectively allow us to create a table that inherits certain columns from Figure 1.3 Using inheritance for different billing strategies14 CHAPTER 1 Understanding object/relational persistence its parent. However, such a feature would be questionable, because it would intro- duce a new notion: virtual columns in base tables. Traditionally, we expect virtual columns only in virtual tables, which are called views. Furthermore, on a theoreti- cal level, the inheritance we applied in Java is type inheritance. A table isn’t a type, so the notion of supertables and subtables is questionable. In any case, we can take the short route here and observe that SQL database products don’t generally implement type or table inheritance, and if they do implement it, they don’t fol- low a standard syntax and usually expose you to data integrity problems (limited integrity rules for updatable views). In chapter 5, section 5.1, “Mapping class inheritance,” we discuss how ORM solutions such as Hibernate solve the problem of persisting a class hierarchy to a database table or tables. This problem is now well understood in the community, and most solutions support approximately the same functionality. But we aren’t finished with inheritance. As soon as we introduce inheritance into the model, we have the possibility of polymorphism. The User class has an association to the BillingDetails superclass. This is a polymorphic association. At runtime, a User object may reference an instance of any of the subclasses of BillingDetails. Similarly, we want to be able to write polymor- phic queries that refer to the BillingDetails class, and have the query return instances of its subclasses. SQL databases also lack an obvious way (or at least a standardized way) to rep- resent a polymorphic association. A foreign key constraint refers to exactly one tar- get table; it isn’t straightforward to define a foreign key that refers to multiple tables. We’d have to write a procedural constraint to enforce this kind of integrity rule. The result of this mismatch of subtypes is that the inheritance structure in your model must be persisted in an SQL database that doesn’t offer an inheritance strategy. Fortunately, three of the inheritance mapping solutions we show in chap- ter 5 are designed to accommodate the representation of polymorphic associa- tions and the efficient execution of polymorphic queries. The next aspect of the object/relational mismatch problem is the issue of object identity. You probably noticed that we defined USERNAME as the primary key of our USERS table. Was that a good choice? How do we handle identical objects in Java? 1.2.3 The problem of identity Although the problem of object identity may not be obvious at first, we’ll encoun- ter it often in our growing and expanding e-commerce system, such as when we need to check whether two objects are identical. There are three ways to tackleThe paradigm mismatch 15 this problem: two in the Java world and one in our SQL database. As expected, they work together only with some help. Java objects define two different notions of sameness: ■ Object identity (roughly equivalent to memory location, checked with a==b) ■ Equality as determined by the implementation of the equals() method (also called equality by value) On the other hand, the identity of a database row is expressed as the primary key value. As you’ll see in chapter 9, section 9.2, “Object identity and equality,” nei- ther equals() nor == is naturally equivalent to the primary key value. It’s com- mon for several nonidentical objects to simultaneously represent the same row of the database, for example, in concurrently running application threads. Further- equals() correctly more, some subtle difficulties are involved in implementing for a persistent class. Let’s discuss another problem related to database identity with an example. In our table definition for USERS, we used USERNAME as a primary key. Unfortunately, this decision makes it difficult to change a username; we need to update not only the USERNAME column in USERS, but also the foreign key column in BILLING_ DETAILS. To solve this problem, later in the book we’ll recommend that you use surrogate keys whenever you can’t find a good natural key (we’ll also discuss what makes a key good). A surrogate key column is a primary key column with no meaning to the user; in other words, a key that isn’t presented to the user and is only used for identification of data inside the software system. For example, we may change our table definitions to look like this: create table USERS ( USER_ID bigint not null primary key, USERNAME varchar(15) not null unique, NAME varchar(50) not null, ... ) create table BILLING_DETAILS ( BILLING_DETAILS_ID bigint not null primary key, ACCOUNT_NUMBER VARCHAR(10) not null unique, ACCOUNT_NAME VARCHAR(50) not null, ACCOUNT_TYPE VARCHAR(2) not null, USER_ID bigint foreign key references USER ) The USER_ID and BILLING_DETAILS_ID columns contain system-generated values. These columns were introduced purely for the benefit of the data model, so how16 CHAPTER 1 Understanding object/relational persistence (if at all) should they be represented in the domain model? We discuss this ques- tion in chapter 4, section 4.2, “Mapping entities with identity,” and we find a solu- tion with ORM. In the context of persistence, identity is closely related to how the system han- dles caching and transactions. Different persistence solutions have chosen differ- ent strategies, and this has been an area of confusion. We cover all these interesting topics—and show how they’re related—in chapters 10 and 13. So far, the skeleton e-commerce application we’ve designed has identified the mismatch problems with mapping granularity, subtypes, and object identity. We’re almost ready to move on to other parts of the application, but first we need to dis- cuss the important concept of associations: how the relationships between our classes are mapped and handled. Is the foreign key in the database all you need? 1.2.4 Problems relating to associations In our domain model, associations represent the relationships between entities. The User, Address, and BillingDetails classes are all associated; but unlike Address, BillingDetails stands on its own. BillingDetails instances are stored in their own table. Association mapping and the management of entity associa- tions are central concepts in any object persistence solution. Object-oriented languages represent associations using object references; but in the relational world, an association is represented as a foreign key column, with copies of key values (and a constraint to guarantee integrity). There are substan- tial differences between the two representations. Object references are inherently directional; the association is from one object to the other. They’re pointers. If an association between objects should be naviga- ble in both directions, you must define the association twice, once in each of the associated classes. You’ve already seen this in the domain model classes: public class User private Set billingDetails; ... public class BillingDetails private User user; ... On the other hand, foreign key associations aren’t by nature directional. Naviga- tion has no meaning for a relational data model because you can create arbitrary data associations with table joins and projection. The challenge is to bridge a com- pletely open data model, which is independent of the application that works withThe paradigm mismatch 17 the data, to an application-dependent navigational model, a constrained view of the associations needed by this particular application. It isn’t possible to determine the multiplicity of a unidirectional association by looking only at the Java classes. Java associations can have many-to-many multiplic- ity. For example, the classes could look like this: public class User private Set billingDetails; ... public class BillingDetails private Set users; ... Table associations, on the other hand, are always one-to-many or one-to-one. You can see the multiplicity immediately by looking at the foreign key definition. The fol- lowing is a foreign key declaration on the BILLING_DETAILS table for a one-to- many association (or, if read in the other direction, a many-to-one association): USER_ID bigint foreign key references USERS These are one-to-one associations: USER_ID bigint unique foreign key references USERS BILLING_DETAILS_ID bigint primary key foreign key references USERS If you wish to represent a many-to-many association in a relational database, you must introduce a new table, called a link table. This table doesn’t appear anywhere in the domain model. For our example, if we consider the relationship between the user and the billing information to be many-to-many, the link table is defined as follows: create table USER_BILLING_DETAILS ( USER_ID bigint foreign key references USERS, BILLING_DETAILS_ID bigint foreign key references BILLING_DETAILS, PRIMARY KEY (USER_ID, BILLING_DETAILS_ID) ) We discuss association and collection mappings in great detail in chapters 6 and 7. So far, the issues we’ve considered are mainly structural. We can see them by considering a purely static view of the system. Perhaps the most difficult problem in object persistence is a dynamic problem. It concerns associations, and we’ve already hinted at it when we drew a distinction between object network navigation and table joins in section 1.1.4, “Persistence in object-oriented applications.” Let’s explore this significant mismatch problem in more depth. 18 CHAPTER 1 Understanding object/relational persistence 1.2.5 The problem of data navigation There is a fundamental difference in the way you access data in Java and in a rela- tional database. In Java, when you access a user’s billing information, you call aUser.getBillingDetails().getAccountNumber() or something similar. This is the most natural way to access object-oriented data, and it’s often described as walking the object network. You navigate from one object to another, following pointers between instances. Unfortunately, this isn’t an efficient way to retrieve SQL database. data from an The single most important thing you can do to improve the performance of data access code is to minimize the number of requests to the database. The most obvi- ous way to do this is to minimize the number of SQL queries. (Of course, there are other more sophisticated ways that follow as a second step.) Therefore, efficient access to relational data with SQL usually requires joins between the tables of interest. The number of tables included in the join when retrieving data determines the depth of the object network you can navigate in User and aren’t interested in the memory. For example, if you need to retrieve a user’s billing information, you can write this simple query: select from USERS u where u.USER_ID = 123 On the other hand, if you need to retrieve a User and then subsequently visit each of the associated BillingDetails instances (let’s say, to list all the user’s credit cards), you write a different query: select from USERS u left outer join BILLING_DETAILS bd on bd.USER_ID = u.USER_ID where u.USER_ID = 123 As you can see, to efficiently use joins you need to know what portion of the object network you plan to access when you retrieve the initial User—this is before you start navigating the object network On the other hand, any object persistence solution provides functionality for fetching the data of associated objects only when the object is first accessed. How- ever, this piecemeal style of data access is fundamentally inefficient in the context of a relational database, because it requires executing one statement for each node or collection of the object network that is accessed. This is the dreaded n+1 selects problem. This mismatch in the way you access objects in Java and in a relational database is perhaps the single most common source of performance problems in Java applications. There is a natural tension between too many selects and too bigThe paradigm mismatch 19 selects, which retrieve unnecessary information into memory. Yet, although we’ve been blessed with innumerable books and magazine articles advising us to use StringBuffer for string concatenation, it seems impossible to find any advice about strategies for avoiding the n+1 selects problem. Fortunately, Hibernate pro- vides sophisticated features for efficiently and transparently fetching networks of objects from the database to the application accessing them. We discuss these fea- tures in chapters 13, 14, and 15. 1.2.6 The cost of the mismatch We now have quite a list of object/relational mismatch problems, and it will be costly (in time and effort) to find solutions, as you may know from experience. This cost is often underestimated, and we think this is a major reason for many failed software projects. In our experience (regularly confirmed by developers we talk to), the main purpose of up to 30 percent of the Java application code written SQL/JDBC and manual bridging of the object/relational is to handle the tedious paradigm mismatch. Despite all this effort, the end result still doesn’t feel quite right. We’ve seen projects nearly sink due to the complexity and inflexibility of their database abstraction layers. We also see Java developers (and DBAs) quickly lose their confidence when design decisions about the persistence strategy for a project have to be made. One of the major costs is in the area of modeling. The relational and domain models must both encompass the same business entities, but an object-oriented purist will model these entities in a different way than an experienced relational data modeler would. The usual solution to this problem is to bend and twist the SQL database domain model and the implemented classes until they match the schema. (Which, following the principle of data independence, is certainly a safe long-term choice.) This can be done successfully, but only at the cost of losing some of the advan- tages of object orientation. Keep in mind that relational modeling is underpinned by relational theory. Object orientation has no such rigorous mathematical defini- tion or body of theoretical work, so we can’t look to mathematics to explain how we should bridge the gap between the two paradigms—there is no elegant trans- formation waiting to be discovered. (Doing away with Java and SQL, and starting from scratch isn’t considered elegant.) The domain modeling mismatch isn’t the only source of the inflexibility and the lost productivity that lead to higher costs. A further cause is the JDBC API itself. JDBC and SQL provide a statement-oriented (that is, command-oriented) approach to moving data to and from an SQL database. If you want to query or20 CHAPTER 1 Understanding object/relational persistence manipulate data, the tables and columns involved must be specified at least three times (insert, update, select), adding to the time required for design and implementation. The distinct dialects for every SQL database management system don’t improve the situation. To round out your understanding of object persistence, and before we approach possible solutions, we need to discuss application architecture and the role of a persistence layer in typical application design. 1.3 Persistence layers and alternatives In a medium- or large-sized application, it usually makes sense to organize classes by concern. Persistence is one concern; others include presentation, workflow, 1 and business logic. A typical object-oriented architecture includes layers of code that represent the concerns. It’s normal and certainly best practice to group all classes and components responsible for persistence into a separate persistence layer in a layered system architecture. In this section, we first look at the layers of this type of architecture and why we use them. After that, we focus on the layer we’re most interested in—the persis- tence layer—and some of the ways it can be implemented. 1.3.1 Layered architecture A layered architecture defines interfaces between code that implements the vari- ous concerns, allowing changes to be made to the way one concern is implemented without significant disruption to code in the other layers. Layering also determines the kinds of interlayer dependencies that occur. The rules are as follows: ■ Layers communicate from top to bottom. A layer is dependent only on the layer directly below it. ■ Each layer is unaware of any other layers except for the layer just below it. Different systems group concerns differently, so they define different layers. A typ- ical, proven, high-level application architecture uses three layers: one each for presentation, business logic, and persistence, as shown in figure 1.4. Let’s take a closer look at the layers and elements in the diagram: 1 There are also the so-called cross-cutting concerns, which may be implemented generically—by frame- work code, for example. Typical cross-cutting concerns include logging, authorization, and transaction demarcation.