Why SysOps, DevOps don't like to add custom fields on large instances of Jira?

 Hi, awesome community, 

 

Each time about Jira performance degradation I read a recommendation for application level is do the cleanup of custom fields. In short, that’s a great option for performance, UX and date clean purposes. 

Sometimes a good explanation of the existing situation will help many administrators, developers, Jira configurators to understand why we need to pay performance for our flexibility. 

In this article, I will describe the standard type of custom fields, as nonstandard can be implemented in another way with another side effect and next time will describe why configuration context is good and for what it was implemented.

 

Fundamentally

Generally, Custom fields based on 3 tables in Jira, you can check here for different major schemes. (e.g. for the newest schemes you can see split user picker filter). 

Снимок экрана 2020-03-01 в 02.52.08.png

 

So that’s the EAV data model with small improvements. (https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model)

 

Small practical questions

Let’s imagine how we can pick an issue with one custom field? What about 10? 100? How many joins on SQL level do you need to do? :) 

So let’s fetch all fields for example, and understand we are paying for our flexibility like string-value, number-value, date-value, etc.

 

Just an another example:

select jp.pkey, ji.issuenum, cf.cfname, cfv.stringvalue, cfv.numbervalue

from project jp, jiraissue ji, customfield cf, customfieldvalue cfv

where jp.id = ji.project

and cfv.issue = ji.id

and cf.id = cfv.customfield

order by jp.pkey, ji.issuenum

 

Which is works provided with limit 10000, during 3.2 with 700 custom fields, and 1500 fields and for the same issues data set instance. (using that tool https://marketplace.atlassian.com/apps/1210725/data-generator-for-jira and on the absolutely the same instance, and query was executed 10th times)

 

Then you will ask how does it work in Jira? Just easy: Caching, working on Java app-level a lot of and improving ORM, and reviewing the existing indexes, and adding the additional table for a specific type like user picker.

So you will ask why it is not possible easy to change into another data model. Hm, I would say at the moment, it’s impossible for on-premises solutions. Therefore the time to go -> Cloud. 

 

So historically..?

Or let’s investigate deeper, what was around 20 years ago? e.g. JSON/JSONB in PostgreSQL was not implemented yet.[https://www.compose.com/articles/is-postgresql-your-next-json-database/

Also, in that period to have the EAV data model in your CMS was normal, because you need to provide a prototype, then MVP and then it can be kept on production. :) 

 


What about community opinion?

Also, you can read threads related to the EAV. 

The favourite answer from one of the thread is:

There's a few general pros and cons I can think of, there are situations where one is better than the other:

Option 1, EAV Model:

  • Pro: less time to design and develop a simple application
  • Pro: new entities easy to add (might even be added by users?)
  • Pro: "generic" interface components
  • Con: complex code required to validate simple data types
  • Con: much more complex SQL for simple reports
  • Con: complex reports can become almost impossible
  • Con: poor performance for large data sets

Option 2, Modelling each entity separately:

  • Con: more time required to gather requirements and design
  • Con: new entities must be modelled and designed by a professional
  • Con: custom interface components for each entity
  • Pro: data type constraints and validation simple to implement
  • Pro: SQL is easy to write, easy to understand and debug
  • Pro: even the most complex reports are relatively simple
  • Pro: best performance for large data sets

Option 3, Combination (model entities "properly", but add "extensions" for custom attributes for some/all entities)

  • Pro/Con: more time required to gather requirements and design than option 1 but perhaps not as much as option 2 *
  • Con: new entities must be modelled and designed by a professional
  • Pro: new attributes might be easily added later on
  • Con: complex code required to validate simple data types (for the custom attributes)
  • Con: custom interface components still required, but generic interface components may be possible for the custom attributes
  • Con: SQL becomes complex as soon as any custom attribute is included in a report
  • Con: good performance generally, unless you start need to search by or report by the custom attributes

I'm not sure if Option 3 would necessarily save any time in the design phase.

Personally I would lean toward option 2, and avoid EAV wherever possible. However, for some scenarios the users need the flexibility that comes with EAV; but this comes with a great cost.


What about another system?

How one of the issues tracking system used the custom field? For example, the Redmine used the same idea, 

https://www.redmine.org/projects/redmine/wiki/DatabaseModel but that model will meet with the next problem https://en.wikipedia.org/wiki/Sparse_matrix often even Jira.

An additional:  https://inviqa.com/blog/understanding-eav-data-model-and-when-use-it

Conclusion

Hope It's clear the Pay Flexibility and Performance.

You can select the unused fields, etc. e.g. example

select count(*), customfield.id, customfield.cfname, customfield.description
from customfield left join customfieldvalue on customfield.id = customfieldvalue.customfield
where customfieldvalue.stringvalue is null 
and customfieldvalue.numbervalue is null 
and customfieldvalue.textvalue is null
and customfieldvalue.datevalue is null
group by customfield.id, customfield.cfname, customfield.description;

just checkout that article https://confluence.atlassian.com/enterprise/managing-custom-fields-in-jira-effectively-945523781.html

That you can easy do explanation, governance, cleanup and recommendation to end users.

 

The article can be continued if it’s needed how it’s affected on Lucene indexes and Field Configuration context if someone will be interested. 

P.S. Let’s check Mike’s presentation about Lucene.



Cheers,

Gonchik Tsymzhitov 

Comments

  1. Anonymous8/12/22 12:04

    This comment has been removed by a blog administrator.

    ReplyDelete

Post a Comment

Popular posts from this blog

How only 2 parameters of PostgreSQL reduced anomaly of Jira Data Center nodes

Stories about detecting Atlassian Confluence bottlenecks with APM tool [part 1]

Atlassian Community, let's collaborate and provide stats to vendors about our SQL index usage