Publishers: Structured Data and Content Management Systems
Last week, I caught the tail-end of an interesting Twitter conversation between @martinstabe, @markng, and @chrised – regarding the inability of most Content Management Systems (CMS’s) to deal with various non-standard content types. This is an issue faced by every online publisher, and is felt most by more traditional publishers who tend to a) have an old CMS platform that was bought many years ago, and b) not employ many development staff to build out new features and functionalities.
A few key quotes from the conversation to kick off:
- By @martinstabe: “We’ve been building all of our data apps external to the main CMS, latest eg : http://bit.ly/bjRnEx #yam”
- By @martinstabe: “Inability to deal with structured data remains number #1 editorial CMS problem”
- By @chrised: “Is it a CMS issue? Might be better handled through data access + presentation plug-in?”
- By @markng: “structured data should be woven into news +advertising in a way that isn’t possible with external plugins and widgets.”
I also came across this Washington Post article, which bemoans their lack of elegant delivery of government data, and says their audience deserves better!
The Post knows it’s lagging. Old technology and short staffing are to blame. Raju Narisetti, the managing editor who oversees the Web site, said its decade-old content management system “can’t really handle a lot of the databases and open-access information.”
And its not just government data that causes problems. Using our semantic extraction technology, we are building a topic-based news portal for a great client, but have found that no mainstream CMS does what we want it to do. Content Management Systems are mostly orientated towards the management process, not the best delivery of news. To have fully integrated semantic data is as much a workflow issue as a delivery issue. All existing mechanisms are about augmenting the delivery in some way, mainly because workflow integration is too tricky. CMS’s are generally big, old and slow; therefore they are built to handle news as it used to be organised, that is, chronologically, by department or by source.
The fundamental issue is that CMS’s are too vertically integrated, much like newspapers. They have tried to solve the whole problem, and therefore have not been flexible enough to adapt to new nuances.
To keep pace with innovation in news format and content type, the best approach is to use use a “platform” approach where system elements can be quickly interchanged, and there are internal APIs to allow for flexible communication between layers.
When the content is separated from the presentation layer, it becomes just one of many possibly input options, alongside and potentially intertwined with government data feeds, externally aggregated content, semantic metadata, geodata, and much more!
Also, the website becomes one of many different output options. With a well-structured API it is quick to deliver new channels, interfaces, partnerships, and even to offer a content and/or data service via API to the wider development community.
The fact that CMS’s are too vertically integrated is an exact reflection of how news organisations find themselves. They have been so focused on the overall “content workflow > print production > distribution > advertising sales” process that they have missed the fact that that the publishing monolith is now broken up into several separate new markets, each with innovators and necessity to change. And that main issue when reorganising, is breaking the publishing process into separate spokes for content creation and content delivery.
News organisations are slowly realising that delivering “data as news” is vital to defend their position as the go-to resource for up-to-date information and analysis.
We are building some very interesting solutions to the above problems for several brands and publishers, and I will post further with more details about our approach and findings.





Well-structured API = very good.
But I think you’ve given too much credit for the core reason why CMSs don’t serve publishers well. It’s not because they are designed to support print production, because most of them don’t (and if they do, they’re not good at it).
The problem, however, is that they are indeed monolithic. Control over templates and access to API is largely confined to a small set of developers such that any modification is a site-wide problem and has to be budgeted, project-managed and generally workflowed into oblivion. Security is an additional issue as people either have access to the templates or they don’t. There’s little support that I can find for ‘sandboxed’ programming work that might affect a small proportion of the content that could do with something extra.
Where those using blog platforms have an advantage is that you have a clean separation between the core CMS engine and the bits on top (such as plug-ins) that either make life easier/faster/better for the editor or improve presentation.
Editorial departments are gradually acquiring a collection of people capable of basic scripting and these could be deployed more usefully to come up with quick additions to a basic structure if the access control was at a finer level of granularity (and which is hardly rocket science, unless you’re writing a CMS, it seems). I’ve lost count of the number of times that I’ve come up against something in a CMS form that could be greatly improved with a little Perl or Python scripting without damaging anything. Instead, people are having to paste the same item of text into four different fields. This is before we even get close to trying to apply some form of semantic annotation, such as microformats or whatever.
Everytime I hear the word API or MVC or Java I throw up a little in my mouth. Some people think the solution to EVERYTHING is a new-phangled API. The problem with APIs is that they are static and when something new comes allong you have to have developers hack against a monolith of API hooks just to get a new section a page.
The problem is that CMSs try define every single piece of content and for the most part I see APIs doing the same thing. Its just another wrapper around the problem.
Hey Some good thought provoking content on here. Nice work.
The real issue is that your original car still has to be paid for, in some way. No bank is going to finance your newer car and take a loss on the first one.. . Just because the second car may have the same sticker price as the first, doesn’t mean you can make a simple swap. If the lender assumes the title of the first car, the most they will offer you is the dealer’s, “low blue book” value. In other words, the wholesale price. But, for the second car, you’ll be expected to pay closer to the retail “high blue book” price. That could be quite a difference. It will be YOU, who’ll have to come up with the difference. Essentially, you will be selling them a used car, and buying a new truck. Don’t forget, the car you “trade in” will be a used car that they will have to sell as a used car.. . It will be kind of like you’ll have to come up with a new “down payment” to make up for this price difference, even IF you can find a lender who’ll be willing to let you finance at the same rate and payment as your first loan.. . The only way for this deal to make any sense to you is if you REALLY need the truck instead of the car.