Ignore the Spec - CMIS 1.0 is for Web Content Management too
Posted 02.03.2010 | Will Ezell
We’ve followed the birth and maturation of CMIS - Content Management Interoperability Services (http://www.oasis-open.org/committees/cmis/) specification for over a year, and as of last December, have thrown our hat in the ring with the OASIS gang to see CMIS reach its 1.0 milestone. And I have to say that we have been excited as the CMIS spec gains credibility. It is the standard the CM world has been waiting for. If interest and adoption continue at their current pace, the content-as-a-service idea of CMIS could change the landscape of all large scale web CM implementations. That is, if we ignore some of what the spec is telling us.
The CMIS 1.0 milestone is a tremendous step in the right direction but at first blush, does not seem quite the fit for the web content management. In fact, the primary authors have flat come out and said that WCM is out of the scope of CMIS. We get that. Check. Now tell us how can we use CMIS to deliver content to users via the web? Because this is what CMIS is going to be used to do, no matter what the intentions. And not just used to deliver supporting documents or heavy assets, but actual web content that may or may not be structured, or that might be multipart content objects.
The most damning criticism of CMIS is that it should really be called DMIS for Document Management, etc, etc… The issue was first identified by Day Software developer (can’t find the link, but hat tip) and has since been pondered and amplified by Kas (http://asserttrue.blogspot.com/2009/06/cmis-or-dmis.html) and others. I had to admit that in perusing the spec, I was jaded by such criticisms, and found that I was on the hunt for concrete examples of the “document management” prejudice. And lo, right there, in the opening paragraph for the CMIS domain model, I find that the spec precludes the idea of serving “compound” documents:
“Specifically, transient entities (such as programming interface objects), administrative entities (such as user profiles), and extended concepts (such as compound or virtual document, work flow and business process, event and subscription) are not included.”
Wikipedia say a compound document is:
“a regular text document intermingled with non-text elements such as spreadsheets, pictures, digital videos, digital audio, and other multimedia features.”
Basically, a compound page is the definition of all modern web pages. So the CMIS spec seems, from the outset, targeted to serve individual documents with associated metadata or fields around them rather than compound content objects that make up a modern web page.
Reading further, I found the fact that the two of the base object Types defined in CMIS “Folder” and “Document” disconcerting as well. These are very file-system-like types. I could see how you could start to reach the premature conclusions that CMIS is a glorified, versioning file system.
The question to me is not whether CMIS is built or appropriate for WCM. Rather, knowing it is inevitable that CMIS will be used in the WCM world, how can we look at CMIS and make it work for WCM?
Digging deeper into the CMIS spec is helpful here; what seem like architectural issues are mainly nomenclature mistakes held over from legacy ECM thinking. It turns out that the “Document” type is really very “Object”-ish or “Node”-ish in the fact that they can have any number of subtypes and have any number of single or multi-value properties. So, you could have a “Document” type that has properties that can include text, xml or html, title, tags, etc… so, we are good here. Also, and this is important, is the fact that a “Document” in CMIS has the possibility to be filed in one or more “Folders” or not in any folder at all. This means that content in CMIS does not need to be filed, or that folders in CMIS do not need to map to a one-to-many hierarchy. “Folders” in fact, are not “Folders” at all but can be seen and used as “Collections” or as Taxonomies. Things are starting to look better for CMIS WCM.
One limitation of CMIS that should be rectified in a future version – perhaps it was discussed and I missed it, if so forgive this digression – is the fact that a “Document” and all subtypes can only have a single contentStream associated with it. The addition of the idea of Renditions, or alternate views, of the content stream are helpful, and mean you can retrieve an image from the primary content stream and its thumbnail as a “Rendition”, but the web world needs the ability to be able to retrieve multiple different possible content streams from a single content object. Case in point, maybe you have a Document type called “Interviews” that has properties like Interviewer, Interviewee, Headshot (image), Transcript (html), audio (mp3), tags, etc… that should be lumped together in a single entity for searching/retrieval. You could fake it with Renditions (which are read only via CMIS), or store a single multipart file in the contentStream (which would be a nightmare for CMIS clients), but sometimes it would be much friendlier and content node like to have the option to keep it all the heavy assets together in a single node without having to dig through relationships to find them.
But make no mistake. Even with these self imposed limitations, CMIS will be useful in WCM. Take a common use case for integration. Let’s say a company has an eCommerce store that has a catalog, products, a shopping cart, etc. which works well transactionally but looks awful. The problem is that the marketing department actually wants to sell products, which involves giving the marketers more control of the templates, the supporting web assets, the reviews, videos, testimonials that surround and decorate the product pages. We are finding that the CMIS 1.0 support as implemented in dotCMS 1.9 goes a long way to solving these types of issues. With CMIS, an integrator can reach into the dotCMS content repository and pull back templates, videos, personalized content, etc.. and use them on external sites. It seems to me that this has very little to do with ECM style document management.