Why Elasticsearch was externalized in dotCMS

Will Ezell

For more than 10 years, dotCMS has leveraged Elasticsearch (ES) as our search engine for content and site searches.  Older versions of dotCMS shipped with Elasticsearch embedded in the product itself, but since dotCMS 5.3, we have externalized Elasticsearch from the core CMS product which provides many benefits to dotCMS installations. In externalizing Elasticsearch, we had to make some choices on how content and content relationships are indexed in ES, specifically that we would no longer index the parent of a content relationship, only a content relationship’s children.

When we made these choices, we did so knowing that they would impact a small subset of customer implementations who directly queried content in Elasticsearch by parents.  Still, we believe that all the benefits of running externalized Elasticsearch outweigh the possible downside of having to reimplement some content querying. We have tried to provide methods and means to allow customers to work around these changes and take advantage of all the operational benefits that eternalized Elasticsearch can provide.

Let’s dig a bit deeper into the challenges we addressed in the core product by externalizing Elasticsearch.

Removing Customization | Prior to this change, dotCMS required a custom Java plugin to be installed on Elasticsearch, which stopped us from externalizing/upgrading Elasticsearch and prevented us from leveraging managed Elasticsearch services like Amazon's OpenSearch, which do not allow plugins.  Needless to say, getting rid of custom plugins is always a good thing from a maintenance & support perspective (Total Cost of Ownership). We now rely on Amazon's hosted Elasticsearch service in dotCMS Cloud, and we have customers adopting this architecture while running dotCMS on their servers / cloud architectures.

Memory usage | In dotCMS pre-5.3, the way relationships were indexed in Elasticsearch also created 10,000's of "fields", one for each relationship-parent-child (Google “Elasticsearch mapping explosion” if you want to learn more). This took up huge amounts of memory in Elasticsearch's internal cache, was completely invisible to dotCMS and could take dotCMS instances down with Out-Of-Memory errors.  Elasticsearch is a big part of the reason why our older (before dotCMS 5.3) instances need so much memory.  Reduced memory requirements has a direct and positive impact on hosting costs and makes running dotCMS more economical.  

Re-indexing Performance | Indexing both a content's parents and a content's children also meant that when a piece of content was saved that had child relationships, we had to reindex not only the piece of content, but also reindex all the children of that content.  In the wild, we saw customers were using relationships like categories where a piece of content had tens of thousands of children. If you edited the parent of such a relationship, it forced a reindex of the 1,000s of children which was a performance killer.  With externalized Elasticsearch, there is no performance penalty when reindexing a piece of content that has 1000’s of children. 

Multi-tenant Elasticsearch | When implementing externalized Elasticsearch, we understood that we were adding a bit of operational complexity to running a dotCMS installation. In order to minimize that, we made sure that multiple dotCMS environments can share the same Elasticsearch instance/cluster without stepping on each other’s toes.  This means that all customer’s environments only need a single Elasticsearch instance/cluster to run effectively and cost-effectively.

Upgrading | When dotCMS changed the way relationships were queried, we tried to be mindful about customer implementations and attempted to provide similar methods in Velocity scripting that automatically re-writes your queries to allow you to query children.  Under the covers, it pulls the lists of children and adds them to the query, which is what we are showing for the raw api query.   For performance, the list of children is cached and while the resulting query is not pretty, it is effective and works.

Goodbye Split Brain : We see a number of larger and older dotCMS implementations hit Elasticsearch split brain issues - where the Elasticsearch cluster elects two different nodes to act as master and the indexes get out of sync.  Since we externalized Elasticsearch, we simply don’t see these issues.  In fact, we’ve seen none with large scale implementations on recent LTS versions (21.06 and 22.03).

Why You Should Upgrade to the latest LTS version

There are dotCMS customers with significant customizations around Elasticsearch, and the impact could be equally significant, but the advantages of externalized Elasticsearch and more recent LTS versions of dotCMS are there and up for grabs:

  • Stability | About 90% of dotCMS customers run on a recent LTS version (21.06 and 22.03) since it provides them the most stable and reliable version where issues can be resolved quickly.
  • Support window | LTS releases are supported for 18-24 months and, when upgraded with LTS-patches, the upgrade from one LTS release to a newer LTS release is a matter of minutes/hours. 

Why You Should Migrate to dotCMS Cloud

If you want to take your dotCMS-powered platform to the next level, a migration to dotCMS Cloud could certainly help. It would allow your technology team to focus on building innovative apps and supporting the business, instead of managing infrastructure. dotCMS Cloud will have a direct impact on the total cost of ownership of your digital platform:

  • System Administration | Getting rid of system administration activities can save up to 1-2 FTEs depending on the size of your dotCMS infrastructure. That easily saves you $200,000 – $300,000 per year. 
  • Annual CMS Upgrades | Included in any dotCMS Cloud subscription is the annual core CMS upgrade (LTS to LTS). Some 3rd-party dotCMS managed hosting providers charge $30,000 - $50,000 for just one upgrade, which would not be necessary if hosted in dotCMS Cloud.
  • Performance Support  |  Performance Support (24/7 Critical care) is included for HA production/live environments on dotCMS, whereas 3rd-party managed hosting providers charge anywhere between $25,000 - $50,000 per year. With dotCMS Cloud you can keep that expense in the till. 
  • Cloud Self-Services | As mentioned in our Spring Updates 2022 webinar, dotCMS is working on a Cloud Control app that enables a plethora of self-service scenarios, including but not limited to ad-hoc back-ups & restores, spinning up new environments, and many more.
  • SOC2 (Type II) | dotCMS is SOC2 (Type II) certified for dotCMS Cloud and the process and controls supporting the product and customers. Customers can be assured their platforms are secure.
  • Power Purchase Agreement | dotCMS procures a lot of AWS cloud products and has significantly better purchasing power than individual enterprises or 3rd parties offering managed hosting. This allows us to offer a competitive product that makes sense for everyone involved. 
  • Tiered Cloud resources | At dotCMS, we don’t believe in one-size-fits-all. Our tiered cloud offering gives you the flexibility for the different environments you need to support your business-critical applications.
  • Enterprise SPA Hosting | dotCMS Cloud now offers Enterprise SPA Hosting as part of our standard product offering. It allows development teams to increase their focus on innovation and building engaging applications to support the business outcomes, instead of managing underlying infrastructure.
Image Credit: Photo by Niklas Ohlrogge on Unsplash
Will Ezell
Chief Technology Officer
May 11, 2022

Filed Under:


Recommended Reading

Headless CMS vs Hybrid CMS: How dotCMS Goes Beyond Headless

What’s the difference between a headless CMS and a hybrid CMS, and which one is best suited for an enterprise?

Why Global Brands Need a Multi-tenant CMS

Maintaining or achieving a global presence requires effective use of resources, time and money. Single-tenant CMS solutions were once the go-to choices for enterprises to reach out to different market...

14 Benefits of Cloud Computing and Terminology Glossary to Get You Started

What is cloud computing, and what benefits does the cloud bring to brands who are entering into the IoT era?