For more than 10 years, dotCMS has leveraged Elasticsearch (ES) as our search engine for content and site searches. Older versions of dotCMS shipped with Elasticsearch embedded in the product itself, but since dotCMS 5.3, we have externalized Elasticsearch from the core CMS product which provides many benefits to dotCMS installations. In externalizing Elasticsearch, we had to make some choices on how content and content relationships are indexed in ES, specifically that we would no longer index the parent of a content relationship, only a content relationship’s children.
When we made these choices, we did so knowing that they would impact a small subset of customer implementations who directly queried content in Elasticsearch by parents. Still, we believe that all the benefits of running externalized Elasticsearch outweigh the possible downside of having to reimplement some content querying. We have tried to provide methods and means to allow customers to work around these changes and take advantage of all the operational benefits that eternalized Elasticsearch can provide.
Let’s dig a bit deeper into the challenges we addressed in the core product by externalizing Elasticsearch.
Removing Customization | Prior to this change, dotCMS required a custom Java plugin to be installed on Elasticsearch, which stopped us from externalizing/upgrading Elasticsearch and prevented us from leveraging managed Elasticsearch services like Amazon's OpenSearch, which do not allow plugins. Needless to say, getting rid of custom plugins is always a good thing from a maintenance & support perspective (Total Cost of Ownership). We now rely on Amazon's hosted Elasticsearch service in dotCMS Cloud, and we have customers adopting this architecture while running dotCMS on their servers / cloud architectures.
Memory usage | In dotCMS pre-5.3, the way relationships were indexed in Elasticsearch also created 10,000's of "fields", one for each relationship-parent-child (Google “Elasticsearch mapping explosion” if you want to learn more). This took up huge amounts of memory in Elasticsearch's internal cache, was completely invisible to dotCMS and could take dotCMS instances down with Out-Of-Memory errors. Elasticsearch is a big part of the reason why our older (before dotCMS 5.3) instances need so much memory. Reduced memory requirements has a direct and positive impact on hosting costs and makes running dotCMS more economical.
Re-indexing Performance | Indexing both a content's parents and a content's children also meant that when a piece of content was saved that had child relationships, we had to reindex not only the piece of content, but also reindex all the children of that content. In the wild, we saw customers were using relationships like categories where a piece of content had tens of thousands of children. If you edited the parent of such a relationship, it forced a reindex of the 1,000s of children which was a performance killer. With externalized Elasticsearch, there is no performance penalty when reindexing a piece of content that has 1000’s of children.
Multi-tenant Elasticsearch | When implementing externalized Elasticsearch, we understood that we were adding a bit of operational complexity to running a dotCMS installation. In order to minimize that, we made sure that multiple dotCMS environments can share the same Elasticsearch instance/cluster without stepping on each other’s toes. This means that all customer’s environments only need a single Elasticsearch instance/cluster to run effectively and cost-effectively.
Upgrading | When dotCMS changed the way relationships were queried, we tried to be mindful about customer implementations and attempted to provide similar methods in Velocity scripting that automatically re-writes your queries to allow you to query children. Under the covers, it pulls the lists of children and adds them to the query, which is what we are showing for the raw api query. For performance, the list of children is cached and while the resulting query is not pretty, it is effective and works.
Goodbye Split Brain : We see a number of larger and older dotCMS implementations hit Elasticsearch split brain issues - where the Elasticsearch cluster elects two different nodes to act as master and the indexes get out of sync. Since we externalized Elasticsearch, we simply don’t see these issues. In fact, we’ve seen none with large scale implementations on recent LTS versions (21.06 and 22.03).
There are dotCMS customers with significant customizations around Elasticsearch, and the impact could be equally significant, but the advantages of externalized Elasticsearch and more recent LTS versions of dotCMS are there and up for grabs:
If you want to take your dotCMS-powered platform to the next level, a migration to dotCMS Cloud could certainly help. It would allow your technology team to focus on building innovative apps and supporting the business, instead of managing infrastructure. dotCMS Cloud will have a direct impact on the total cost of ownership of your digital platform:
Schedule a call with a dotCMS product specialist to see if dotCMS is right for you.Request Demo
Maintaining or achieving a global presence requires effective use of resources, time and money. Single-tenant CMS solutions were once the go-to choices for enterprises to reach out to different market...
What is cloud computing, and what benefits does the cloud bring to brands who are entering into the IoT era?
What’s the difference between a headless CMS and a hybrid CMS, and which one is best suited for an enterprise?