The Anti-Pattern – EAV(il) Database Design ?

Early evening TV in our house is Soap time. Deb annexes the remote control, after which we are treated to an
assortment of angry women being angry with each other in a variety of accents originating from the North of England.
It could be worse, I suppose. We could be subjected to the offering on the other main channel ( angry London women being angry at each other in accents originating from the South East of England).
Then again, either is preferrable to an angry Welsh woman being angry at you in a Welsh accent.
Ok then, how do you make a database professional hot under the collar ? Mention the EAV design pattern.

This pattern goes by many names, most commonly :

  • EAV – Entity-Attribute-Value
  • OTLT – One True Lookup Table
  • Open Schema
  • Diabolically Enticing Method Of Data Storage (DEMONS)

OK. I made that last one up.
It is with some trepidation ( and having donned precautionary flame-proof underpants) that I am embarking on an exploration on the nature of EAV and whether it can ever be appropriate for use in a Database. Before we go any further though, I’d like to take a moment to clarify exactly what the term “database” means in the context of this discussion

NoSQL, No comment

We’re talking here about Relational Database Management Systems (RDBMS).
There are a number of NoSQL databases out there, which are another matter entirely.
The discussion here pertains to RDBMSs only ( and Oracle in particular).
Therefore, the most relevant comparison is between EAV and the traditional Relational Design approach.

Now that’s out of the way, it’s time to introduce the protagonists….

In the Red Corner…Relational Design

Supplied with the Oracle database is the example HR schema.
This has been designed, as you might expect, using the relational approach. The Physical Data Model looks like this :

A table relationship diagram, just like Mother used to make
A table relationship diagram, just like Mother used to make

Entities such as Employees and Departments are implemented as discreet physical tables. Attributes of these entities are implemented as columns on those tables.
Primary Keys for each entity are identified ( sometimes implementing a synthetic key for convenience).
Relationships between entities are implemented as Referrential Integrity Constraints.

This structure is easy to follow. Well, it’s easy to follow when done properly.The TRD provides a map as to how the data in the application hangs together.

In the Blue Corner…The EAV Pattern

The EAV pattern dictates a rather simpler approach to database design. Rather than having a discreet table for each entity and it’s attributes, it holds everything together in what amounts to a name/value pair.
The resulting data structure, as the name of the pattern suggests, holds details of the entity, the attribute of that entity, and the value of that attribute.
It will also hold a unique key to identify the instance of the entity to which the value relates.
There are variations on this particular theme. EAV can be implemented with a catalogue of entities and attributes and used to define them to a greater or lesser extent. Separate tables can be implemented to account for different value datatypes. It can even be implemented for specific groups of Entities within an application.

For illustrative purposes, I’m going to use a fairly simple three table model containing :

  • ENTITIES table – a list of valid entities for the application
  • ATTRIBUTES table – a list of valid attributes for a given entity
  • ATTRIBUTE_VALUES table – the values for each entity/attribute combination

The resulting DDL is :

CREATE TABLE entities
(
    entity_name VARCHAR2(100) PRIMARY KEY,
    description VARCHAR2(4000)
)
/

CREATE TABLE attributes
(
    attr_id NUMBER PRIMARY KEY,
    attribute_name VARCHAR2(100),
    entity_name VARCHAR2(100) NOT NULL REFERENCES entities( entity_name),
    CONSTRAINT attr_uk UNIQUE (attribute_name, entity_name)
)
/

CREATE TABLE attribute_values
(
    attr_id NUMBER NOT NULL REFERENCES attributes( attr_id),
    record_id NUMBER NOT NULL,
    value VARCHAR2(4000),
    CONSTRAINT av_pk PRIMARY KEY ( attr_id, record_id)
)
/

So, we have a list of unique entities ( ENTITIES table). Attributes of those entities ( also unique within an entity) are stored in the ATTRIBUTES table.
For ease of reference, we’ve created a synthetic key on ATTRIBUTES (attr_id).
This is used as a Foreign Key in ATTRIBUTE_VALUES.
We could make the model more complex to allow values to be stored in their native datatype by either having an ATTRIBUTE_VALUES table for each type, or by having a value column in the table for each type. For the purposes of this discussion however, I’ve decided to keep it simple.

Adding Entities and Attributes in EAV

Unlike the Relational Model, if we want to add entities and attributes to our EAV model, we simply need to insert the requisite details :

--
-- Create the DEPARTMENT and EMPLOYEE entities in EAV
--
INSERT INTO entities( entity_name, description)
VALUES('DEPARTMENTS', 'The Departments in the company')
/

INSERT INTO entities( entity_name, description)
VALUES('EMPLOYEES', 'All of the Employees in the company')
/

--
-- Now for some attributes of Departments 
--
INSERT INTO attributes( attr_id, attribute_name, entity_name)
VALUES(1, 'DEPARTMENT_ID', 'DEPARTMENTS')
/

INSERT INTO attributes( attr_id, attribute_name, entity_name)
VALUES(2, 'DEPARTMENT_NAME', 'DEPARTMENTS')
/

--
-- And for Employees
--
INSERT INTO attributes( attr_id, attribute_name, entity_name)
VALUES(3, 'FIRST_NAME', 'EMPLOYEES')
/

INSERT INTO attributes( attr_id, attribute_name, entity_name)
VALUES(4, 'LAST_NAME', 'EMPLOYEES')
/

INSERT INTO attributes( attr_id, attribute_name, entity_name)
VALUES(5, 'DEPARTMENT_ID', 'EMPLOYEES')
/

COMMIT;

Normally in Oracle, you’d use a sequence to generate the attr_id. Once again, I’ve left this out for brevity as it’s not relevant to the discussion at hand.

The comparative ease with which attributes can be added is what leads proponents of this pattern to characterise it as flexible.

Entering Data

Inserting records into the EAV implementation is a little different to the Relational one.

Adding a Department and a couple of employee records in the relational model would require a single insert statement for each record :

--
-- Create a new department in the relational model
--
INSERT INTO departments(department_id, department_name)
VALUES(300, 'OUR HOUSE')
/

--
-- And a couple of "Employees"
--
INSERT INTO employees( employee_id, first_name, last_name, department_id)
VALUES( 1, 'DEB', 'LOVELY', 300)
/

INSERT INTO employees( employee_id, first_name, last_name, department_id)
VALUES( 2, 'MIKE', 'LESS-LOVELY', 300)
/

COMMIT;

To save the same information in the EAV model would require something like :

--
-- Create a new Department
--
-- Department ID 
--
INSERT INTO attribute_values( attr_id, record_id, value)
VALUES(1, 1, '300')
/

--
-- Now the Department Name
--
INSERT INTO attribute_values(attr_id, record_id, value)
VALUES(2, 1, 'OUR HOUSE')
/

--
-- Create the new Employee Records
--

--
-- Ladies first !
--
INSERT INTO attribute_values( attr_id, record_id, value)
VALUES(3, 1, 'DEB')
/

INSERT INTO attribute_values( attr_id, record_id, value)
VALUES(4, 1, 'LOVELY')
/

INSERT INTO attribute_values( attr_id, record_id, value)
VALUES(5, 1, '300')
/


--
-- Last...and least...
--
INSERT INTO attribute_values( attr_id, record_id, value)
VALUES(3, 2, 'MIKE')
/

INSERT INTO attribute_values( attr_id, record_id, value)
VALUES(4, 2, 'LESS-LOVELY')
/

INSERT INTO attribute_values( attr_id, record_id, value)
VALUES(5, 2, '300')
/

COMMIT;

Getting Data Out

As you’d expect from the name, retrieving an array of name-value pairs using EAV is reasonably straightforward :

SELECT av.record_id, attr.attribute_name, av.value
FROM attributes attr, attribute_values av
WHERE attr.attr_id = av.attr_id
AND attr.entity_name = 'EMPLOYEES'
ORDER BY av.record_id
/

Nothing too scary there. OK, what if we want to get the department name instead of the department id.

In a relational model, the query would be fairly trivial :

SELECT dept.department_name, emp.first_name, emp.last_name
FROM employees emp, departments dept
WHERE emp.department_id = dept.department_id
ORDER BY emp.employee_id
/

The EAV query, however, is more of a challenge :

WITH dept AS
(
    SELECT did.value as department_id, dname.value as department_name
    FROM attribute_values did, attribute_values dname, attributes atr_did,
        attributes atr_name
    WHERE did.attr_id = atr_did.attr_id
    AND dname.attr_id = atr_name.attr_id
    AND did.record_id = dname.record_id
    AND atr_did.entity_name = atr_name.entity_name
    AND atr_did.entity_name = 'DEPARTMENTS'
    AND atr_did.attribute_name = 'DEPARTMENT_ID'
    AND atr_name.attribute_name = 'DEPARTMENT_NAME'
),
emp AS
(
    SELECT did.value as department_id,
        fname.value as first_name,
        lname.value as last_name
    FROM attribute_values did, attribute_values fname, attribute_values lname, 
        attributes atr_did, attributes atr_fname, attributes atr_lname
    WHERE did.attr_id = atr_did.attr_id
    AND fname.attr_id = atr_fname.attr_id
    AND lname.attr_id = atr_lname.attr_id
    AND did.record_id = fname.record_id
    AND did.record_id = lname.record_id
    AND atr_did.entity_name = atr_fname.entity_name
    AND atr_did.entity_name = atr_lname.entity_name
    AND atr_did.entity_name = 'EMPLOYEES'
    AND atr_did.attribute_name = 'DEPARTMENT_ID'
    AND atr_fname.attribute_name = 'FIRST_NAME'
    AND atr_lname.attribute_name = 'LAST_NAME'
)
SELECT dept.department_name, emp.first_name, emp.last_name
FROM dept, emp 
WHERE dept.department_id = emp.department_id
AND dept.department_id = '300'
/

Techniques employed to mitigate this complexity are typically either to create views (as in this example) to present the data in a more SQL-friendly manner, or to programatically generate queries on-the-fly.

Such code generation is perfectly possible, and the sophistication of the generated queries will depend on the meta-data being stored in the ENTITY and ATTRIBUTE tables.
Interestingly, some proponents of EAV see this issue as being a problem with SQL’s lack of flexibility rather than with the pattern itself.
We’ll come back to that in a bit.

The Attractions of EAV

Flexibility

As mentioned above, the main technical attraction of EAV is the flexibility it provides in terms of defining new attributes…and entities, come to that.
Not only are such changes an insert statement away, it also allows attribute and entity names of any reasonable length you desire. In Oracle 11g ( and since time immemorial), the maximum length of a table or column is 30 characters.
In the above example code, we have the facility to define entity and attribute names of up to 100 characters.
A less technical, but still pertinent point – in large corporations, changes to the data model tend to have to go through some exacting (and, in some cases, painfully beaurocratic) approval process. The EAV approach can be seen as a way of by-passing this tangle of red tape.

Simplicity

The simplicity of the data model ( in our case, create three tables, job done), means that modelling does not take a great deal of time.
Most of the world seems to be going Agile these days ( or at least, using Agile terms). Having that data model sticky move across the Sprint Board in a single sprint is a sight to warm the cockles of a Scrum Master’s heart.
In other contexts, it may be the Web Designer, who needs to develop a data-driven site, who sees the advantages of EAV. Minimal messing about with the database design so they can get on with what they do best.

Reusability

You can use this model to store more-or-less anything you want. You never have to design a database ever again.

I mean, let’s face it, what could possibly go wrong ?

The nature of Relational Databases

SQL

It is an eternal truth that the biggest factor affecting the performance of an application running on an RDBMS is the design of it’s data model.

SQL is intrinsic to an RDBMS in the same way that your brain is intrinsic to your body.
Whatever code you write to do anything with data in an RDBMS, at some level a SQL statement will be executed.
SQL has it’s roots in the same relational theory that gave rise to relational databases.
Furthermore, SQL is a declarative language. You tell it what you want it to do, it has to work out how best to do it.
You can help SQL become more efficient by giving the optimizer tools to help it.
These tools include indexes, histograms, even RI constraints.
These tools are, at best, of limited value in an EAV model.
Furthermore, distribution of data becomes somewhat problematic as most of it is stored in one (or few) tables.
Of course, this can be mitigated to an extent by means of partitioning. However, in Oracle, partitioning is an additional cost feature of the database Enterprise Edition, which is not what you would call cheap.

The Catalog

Part of the definition of a Relational Database is that it retains a catalog of all of the objects held within it.
In Oracle, this catalog is known as the Data Dictionary.
If you create an object, it’s automatically saved in tables in the Data Dictionary.
If you create a table, you can find it in the USER_TABLES data dictionary view. A table’s columns can be found in the USER_TAB_COLS data dictionary view.
To the developer, all of this meta-data is “for free”. No special code needs to be written to ensure that this meta data is stored, it just happens automatically.

The problem with EAV…

It’s at this point where we can spell out the catch. Or catches.
Relational Database Management Systems tend to operate most efficiently when application data is arranged in broadly relational structures.
An EAV approach is likely to run into a number of challenges.

Performance degradation over time

Once the amount of data in the application grows beyond a certain size, the retrieval and manipulation of that data is likely to become less and less efficient.
Applications based on the relational approach are also likely to hit this problem. The difference is that the tools available to deal with these issues are largely unavailable for the EAV model.
When tuning problem statements, the first task is usually to look at index usage. Should there be an index ? Is there an index which is not being used where it should ? etc.
Indexes are built on one or more columns of a table. You can’t add an index to an attribute in our attribute_values table.

Reporting

We’ve already looked at the challenges presented in getting reports out of the EAV design.
Whatever the original requirements an application has been built to, it’s quite likely that at some point, someone will ask for some reporting/analysis on the data in the application.
This is likely to prove painful both in terms of complexity and, once again, performance.

Data Integrigy

As I’ve already observed, data integrity checks that are intrinsic to a Relational Database when using the Relational Design approach are largely absent in the EAV model.
It is possible to implement them “manually” but this requires a fair amount of effort.
The sort of things we’re talking aobut here include :

  • datatype of input data
  • maximum allowable size of input data ( length of a VARCHAR, number of decimal places in a numeric value etc)
  • uniqueness of a record with in an entity( there’s only one Deb !)
  • parent-child relationships between attributes (e.g. an Employee must be a member of an existing Department)

All of that additional code will add complexity to your application.
Of course, there is a design decision to be made in terms of how much of this you implement in EAV.
It’s often the case that a trade-off occurs between effort and data integrity.
The upshot is that EAV applications can be susceptible to data corruption ( orphaned records, duplicate records etc).

The Learning Curve

As I’ve said before, a Table Relationship Diagram for a Relational data model is a bit like a map. Following the relations between tables makes it relatively easy to learn how the data in an application hangs together.
No such map is readily available with EAV. As a result, the learning curve for new developers or support analysts tends to be significantly steeper.
This issue is exacerbated where the application employs a significant amount of dynamic SQL.
Such code is much harder to understand and debug as, before anything else, you have to figure out the SQL statement that is actually executed.

EAV as a replacement for Relational Design

The EAV pattern looks very appealing at first glance.
At first glance is quite appropriate here. The upsides of EAV are usually most apparent during design and development.
The downsides don’t tend to manifest themselves until after the application has gone live.
There are some problem domains, such as online catalogs, where EAV is put forward as the best solution for holding volatile, sparsely populated attributes.
Based upon my own first-hand experience of supporting applications in which EAV has been employed, I would be inclined to treat such assertions with some scepticism.

I think that the key point to bear in mind is that no system is perfect out-of-the-box.
Whatever design pattern you employ, be it Relational or EAV, there will be gaps – stuff you just haven’t thought of or anticipated. This could be in the model itself, the reporting requirements, or the data and usage volumes.
In such circumstances EAV would appear to provide a much smaller margin of error. If you don’t get it spot-on first time, you’re in big trouble. Even if you do, the scope for the evolution of your application is far more limited by it’s design than would be the case for the Relational approach.

I’m not the only one to have had direct and painful experience of supporting an EAV designed application on a Releational Database. This particular horror story is an extreme example, but does encapsulate some of the problems commonly encountered with this pattern.

So, is there absolutely no place for EAV in a Relational Database ? Well, I wouldn’t say that….

The Thingy Table

One of the first ( and best) Oracle books I read, many years ago, was Ed Koch’s Oracle 7 – Complete Reference.
In the book, Koch made the point that sometimes, you have an item of data that just doesn’t relate to anything else.
To store this data, Koch advanced the concept of the “Junk Drawer” table.
An example of such a data item might be the Company Registration Number, which goes on the footer of all correspondence generated from the application.
Rather than hard-coding this value in every letter generation program, you could simply store it in a table.
Other instances where such a table would be useful might include values that can be looked up from stored program units.
Some companies have a shutdown period over the Christmas and New Year period. The dates of the shutdown vary from year to year. Storing the start and end dates for the shutdown in a table would enable any scheduled jobs over the period to reference these values and then simply terminate if they were running during the shutdown.

Of course, such items could just be declared in a PL/SQL package header. However, changing them would cause invalidations of any database stored program unit dependent on the package in question.
With the table approach, this is not an issue.
Such a table would probably look somewhat familiar to the EAV attribute_values table :

CREATE TABLE application_params
(
    param_name VARCHAR2(30) PRIMARY KEY,
    char_val VARCHAR2(4000),
    num_val NUMBER,
    date_val DATE
)
/

--
-- Company Registration Number
--
INSERT INTO application_params( param_name, num_val)
VALUES( 'COMPANY_REG_NO', 123456)
/

--
-- Some batch job may be checking this range to make sure they don't run 
-- over the Christmas Shutdown
--
INSERT INTO application_params( param_name, date_val)
VALUES('XMAS_SHUTDOWN_START', TO_DATE('25-DEC-2013', 'DD-MON-YYYY'))
/

INSERT INTO application_params( param_name, date_val)
VALUES('XMAS_SHUTDOWN_END', TO_DATE('05-JAN-2014', 'DD-MON-YYYY'))
/

commit;

To be clear, this is not the same as taking an OTLT ( One True Lookup Table) approach where a number of entities, each with only a few rows, are lumped together in one big table.
This is more a home for these, rare, data items that don’t relate to anything else.

Care needs to be taken about what data is held in this table.
There is always a danger that inappropriate data items will be added and the table can quickly grow from a few application configuration values into several thousand rows.
When this happens, the advantage of having this table is lost and there is a high probability that all of the problems associated with EAV will manifest themselves.

Conclusion

EAV is one of those oh-so-tempting but oh-so-dangerous design patterns in the context of RDBMS.
Extreme caution ( and some serious analysis) needs to be done before implementing it in your database.
If you decide it really is the solution to all of your problems, then consideration needs to be given as to whether an RDBMS is what you should be using in the first place.
I’m in the good books (with Deb, at least) as I’ve volunteered to cook Christmas Dinner, and do the washing up afterwards.
Hopefully that take long enough to avoid the Soap Christmas Special extravaganza that’s otherwise coming my way.

Author: mikesmithers

Back in 1993, I discovered that I could get paid money for doing fun stuff with computers. Over the years, I've specialised in Oracle Databases as a developer, a DBA and sometimes, an architect. It's my evil alter-ego - The Antikyte - who writes a blog about my various technical adventures. Yes, that is his Death Star parked in the Disabled Bay. I currently live in the South-West of England with Deb, my long-suffering wife.

29 thoughts on “The Anti-Pattern – EAV(il) Database Design ?”

  1. Nice post! You’ve nailed the disadvantages of EAV. Another one for your list of data integrity issues is that there’s no way to make an attribute mandatory, as we would in a conventional design by using NOT NULL.

    Basically, when you use EAV, you lose the value of the RDBMS. You have to write a lot of application code to reimplement features that are already provided by the RDBMS. It’s likely that you will have bugs in your code (all code has bugs).

    I’ve written about EAV here: https://karwin.blogspot.com/2009/05/eav-fail.html

    I always surprised at how much pushback I get from developers when I warn that EAV is a non-relational design. It seems it’s a favored architecture of many, and they’re not willing to give it up or even admit to its weaknesses.

    Like

    1. Bill,

      I like your post – I thought the picture was especially pertinent !

      You’re absolutely right. Nullability is another of those things that you have to code into an EAV based application.

      Mike

      Like

    1. Rodger,

      Thanks for the link.

      Reading your post I’d guess that, like me, you’ve experienced the joys of trying to support an application based on EAV running on Oracle.
      I think we’re agreed that if EAV is part of the solution, then a Relational Database probably isn’t.

      Mike

      Like

  2. Would an EAV approach be appropriate for an application in which end users have flexibility to design or adjust data schemas through the application? Is there a better approach for such flexibility that fits better with an RDBMS?

    Like

    1. Misha,

      Is this a genuine requirement ? Could you achieve the required functionality via other means – i.e. through
      a structured data modelling exercise ? Do they really need to create new attributes, or are they adding data that the current data model does not accommodate ?
      If so then, I’d suggest that such an approach would save you large amounts of pain throughout the life of the application.
      Relational databases are really good at managing data that’s stored in a Relational Structure. EAV is not relational.
      If users absolutely must be able to do this then a No SQL database is probably better suited to such a requirement.

      HTH

      Mike

      Like

  3. “There are some problem domains, such as online catalogs, where EAV is put forward as the best solution for holding volatile, sparsely populated attributes.
    Based upon my own first-hand experience of supporting applications in which EAV has been employed, I would be inclined to treat such assertions with some scepticism.”

    In this context of online catalogs, what would you suggest then ?

    Like

    1. Cyril,

      Having considered this question carefully, the best answer I can come up with is “it depends…”
      Can I come up with a sensible relational model to support the requirement ?
      What are the data and traffic volumes likely to be ?
      What is the lead time on implementing data model changes ?
      How important is data integrity to this application ?
      Am I restricted to implementing this in an RDBMS, or am I starting from scratch ?

      I used the word “scepticism” deliberately. I’m not saying that EAV isn’t the best approach to solve this type of problem.
      I am saying that I would need some convincing that this was the case.

      Mike

      Like

  4. Hi Mike.
    I really admire your skills of nailing the problems. I’ve already had an opportunity to work with a team of Java developers who wanted to have EAV model for a quite static schema.
    It seems to me that EAV or the schema-less approach is only suitable in situations where you handle unstructured data or data of unpredictable/flexible structure.
    Though many modern and also not so modern structured formats (XML/JSON/JAML) support varying data

    Like

    1. emm (tab+enter did it’s job).
      ..varying data structures, there is always some structure, even if it’s varying.
      going to the point. EAV is great for use case as you’ve described (parameters table), if you need flexibility beyond Relational model you may always store XMLTYPE, CLOB or Object Type (with inheritance model) in your database.
      Though I’ve never seen actual use case for those that could not be replaced with decent relational approach.

      Anyway, thanks for blogging. I really enjoy reading your posts.
      Cheers
      Jacek

      Like

  5. Here are my personal thoughts after almost 20 years of development , it is literally a stupid design !!!, you should use NoSql for that instead of eliminating all the DBMS features and for what !!!

    When you design a system you should see the big picture it is not only about storing data it is about how do you manage to consume this data, that approach it is hell in reports, or exposing it to UI components such as grids and trees, you will loose all the goodies like column sorting or filtering features bundled with these plug and play components

    Bottom line hell no

    Like

  6. While I agree with most of your article, there’s a few points where I feel you a really introducing a “strawman” by positing a really dumb EAV implementation, then arguing it behaves in a dumb way.

    E.g. “data integrity checks that are intrinsic to a Relational Database when using the Relational Design approach are largely absent in the EAV model.” Somewhere in any proper EAV system there should be a table which defines the attributes of a class. The record that defines an attribute can trivially include data validity rules – possibly much more sophisticated ones than possible in a simple database. Yes, the application middleware layer is going to have to do the checking, but I don’t see why having Microsoft’s application (e.g. SqlServer) checking the data has to be intrinsically better than someone else’s application checking it.

    One interesting feature that can be built into an EAV database is “attribute ordinality”. For example, given a ‘Person’ class (equals table in a relational database), you might have a column for ‘EyeColour’. One day you find that the client wants to be able to store ‘one eye is blue, the other green’. In a relational database you have a number of choices – all ugly. You can treat the field as free text, add an ‘other eye’ column, force the input to “mixed”, or haul the data off to another table. But if your EAV database has the capability, then all you have to do is change the metadata of the ‘EyeColour’ attribute from (0 or 1 entries allowed) to (0, 1, or 2 entries allowed).

    Another example of EAV flexibility – which is able to be validated – would be to define an attribute as ‘class A attribute X points to class Y’. i.e. a primary key/foreign key relationship, the bread and butter of a relational database. However, there are very few databases where you can define a FK as ‘class A attribute X points to one of . As a farming example, you might want to record that “chickens are poisoned by the plant SPECIES Rhubarb, the plant GENUS citrus, and the whole plant FAMILY Solanaceae”.

    Performance … yeah. I agree, you’re never going to get blazing performance on a large EAV database without some really hard work.

    Just some thoughts.

    OmnivoreNZ, programming since 1973.

    Like

  7. A really excellent article. I have been looking for something like this for some time. I think that you really nailed it. Thanks

    Like

  8. I am working in a giant telecom company, we have huge number of tables where many columns data is not required to be stored hence my company is following open schema(EAV model) to store data otherwise really difficult if RDBMS is being used instead of EAV. It is challenging work but good to learn new things isn’t ? so why against of EAV model.

    Like

    1. Trass,

      I have no problem with EAV per se. Where this pattern becomes problematic is when you try to shoe-horn it into an RDBMS.
      Then the problems that I’ve described here begin to manifest themselves.

      Mike

      Like

  9. Great article, thanks for the effort you’ve put into this. I’m a DBA, BI Dev and IT Architect of several years, and am researching the pitfalls of EAV, as I am building a large application which will use an EAV type structure to store certain data objects that are highly variable in what they contain. I tossed up the use of JSON, but thought better given that I’ll need to report and retrieve against certain attribute values, which would benefit from indexing.

    I agree somewhat with @omnivornez above who claims that the example given is a strawman argument. I think the basic layout of EAV used here is fatally flawed, but accurately reflects the state of play in the real world, so is very relevant to the reader. Most EAV systems that I’ve seen fall into the trap of having the metadata and data stored in the same table. As I see it, the data needs to be stored as both metadata (abstract) and data (concrete):

    AttributeType (abstract – stores non-affiliated attributes, and metadata on them, e.g. code, description, type)
    EntityType (abstract) – stores different entity types, basically a lookup table (e.g. “Department”, “Person”, “Vehicle”)
    EntityTypeAttribute (abstract) – a map of the attributes of an entity type, bridging AttributeType.ID and EntityType.ID
    Entity (concrete) – an actual entity with EntityType.ID (e.g. John Smith, eMarket, Supercar)
    EntityAttribute (concrete) – a map of an concrete entity’s concrete attributes, bridging with Entity.ID and EntityTypeAttribute.ID
    EntityAttributeValue (concrete) – stores the values of an entity’s attributes, with EntityTypeAttribute.ID and sparse value columns for each type. Optionally, add a column “isCurrent” so that historical information can be stored.

    Using this sort of set up, you end up with small table sizes for EntityType and AttributeType, which you can index heavily, and are very performant. EntityAttributeType is moderately sized, and can be indexed on it’s FKs, allowing good leverage against the upstream EntityType and AttributeType indexes. EntityAttribute is a large table, but again benefits from the relation to EntityTypeAttribute and so on. That leaves EntityAttributeValue which can be sparse indexed on pairs like (EntityAttributeID, IntegerFIeld) and (EntityAttributeID, BooleanField)

    I’ve implemented this sort of structure in mid-sized applications before with no tangible ill effects other than SQL queries are more complex, and the DBA has to be on top of indexes and auto-vacuums/reindexing jobs. I haven’t tried this on very large applications though. Thoughts?

    Like

    1. Kjartan,

      apologies for the delay in responding. This is the latest of several drafts of this reply.

      What you are proposing is a replication of some of the functionality offered by an RDBMS Catalog ( the Data Dictionary in Oracle terms).
      In my experience this approach does nothing to resolve the issues of EAV outlined in the article.
      It may be worth taking a step back and asking yourself whether you need a Relational Database at all for your particular application.
      If all of your application logic, including data logic, is to be implemented in a tier above the database and the database itself is merely a persistence layer, then an RDBMS is just as likely to get in the way as to provide this facility. In such circumstances, you may consider a noSQL solution where you access persisted data via some non-SQL solution (e.g. Spark) may be more sensible.
      If you do conclude that a relational database is a must for your application, then storing your data in a relational model will go a long way to getting the most out of the RDBMS.
      HTH

      Mike

      Like

      1. Hi Mike,

        Thanks for the reply. We’ve decided to go with a hybrid approach, which I’ll try to explain here. The particular problems we are trying to solve are: maintaining relational aspects where possible, having objects that are polymorphic within their supertype, having objects which are entriely dynamic within their subtype, reporting on a few key data points, and storing historical data on the fly (i.e. self-data warehousing).

        For this we have split data into two classifications – first class data and second class data. All first class data is being stored in the classic 4NF relational tables and comprises most of the reportable data, and also the essentially immutable data (product names, account usernames, etc). Second class data is stored in EAV tables, and comprises data that we either need to keep historical record of changes on, or is non-essential in terms of reporting. Data that is dynamic but that we desperately need to report on either gets duplicated into first class columns (the data layer of the central web service takes care of this), or is made accessible via Views that mine out the particular data point in that EAV supertype.

        The EAV tables are not a big mess of everything – there’s 6 critical tables and 2 optional tables for each supertype, 8 tables in total. These tables are stored as generic non-populated tables in a separate schema, and then a new schema is taken as a clone of that schema when we need a new EAV supertype. Currently, we’re looking at the biggest table putting on about 50-100k rows/year. Also, being in PostgreSQL means that if we ever have issue with the EAV schema being slow or unneccesary, we can always transform portions of the data to the NoSQL-esque JSONB type, saving on space and processing time.

        I’ve seen some bad EAV implementations in my time, and while I don’t think it’s a good fit for all data problems (and you’re right, NoSQL often ticks all the boxes), for what we’re doing I’m trying out this approach as it fits better than other solutions. It’s possible in future that we will migrate some data/functionality/responsibility to other processes or platforms. Sometime in the future when it’s proven itself either successful or a failure, I’ll write up a blog about it, and post a link back here for future readers.

        Cheers,
        Kjartan

        Like

  10. The one thing no-one’s touched on yet for EAV is the ease of creating new relationships, new routes through the database etc. It’s so easy with EAV.
    In fact, you don’t even need to think about it at all – just bung a few more rows in the three tables and job done (that’s your immediate job for the task you’re currently on). No need to check if it’s being done anywhere else, or whether your new route through the database now allows a contradiction into the (lack of) design [i.e. you can get from business-meaningful thing A to F via B and C and route and via D and E the other: when you use route 1 then thing A with id 12345 links to thing F id 54321. Going via the other route, 12345 links to 5430. Which is true?] You’ve broken the “single point of truth” idea because you didn’t have to think about the overall design – which is vital in a system of record.

    Like

    1. Martin,
      I’m having flashbacks reading your description. Absence of data model design in a System of Record is, as you point out, likely to make your life rather interesting.

      Mike

      Like

  11. Hi,

    This article is really useful, and I enjoyed reading it.

    We have a platform which was originally planned to use EAV, but after reading further into the pros and cons, and performance testing a prototype with large datasets, we realised it was not viable and settled on a JSON field on each table to allow for user specified fields. For example, a users table has the required essentials specified by the design, such as username, password hash, email address, first and last name, password reset date, etc, but as well as this the user can add their own custom fields.

    JSON fields are far superior to EAV designs (depending upon database) because they can be indexed, queried, sorted, joined etc very easily using built in SQL functions.

    The only downside to JSON fields is they’re very implementation specific, we couldn’t move to another platform without significant re-engineering work and rewriting a lot of our stored procedures.

    Performance wise, they walk all over EAV leaving it in the dirt.

    Like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.