Making assumptions about museum schemas

Wed Feb 10 18:01:03 UTC 2010

Hey all,

We've got a fairly ugly issue in Engage 0.3b related to the assumptions we are making in code about how artifact and exhibition data is structured in Couch.

Here's a quick refresher of the architectural approach we are working towards for Engage:

The goal is "schemalessness." At the database level, we don't want to force museums to conform to a rigid structure for their data. In other words, we don't tell a museum what schema they need to conform to in order to Engage. Instead, we adapt to the diversity of their collection.

One of the chronic problems with integrating data across disparate systems in a museum is preserving the linkage between information in each system. By being schemaless, we ensure that there's never a mismatch between competing schemas or the accuracy of data sources. Museums can easily query Engage's data feeds, extract data from the system, and put it directly back with their in-house content and collections management system. Our architecture is fundamentally designed to enable data to move freely both upstream and down.

That said, we still want to shield presentation-layer code from having to contend with an infinite variety of schemas. To this end, we need to provide a single layer of framework code that can map data from a museum-specific format to something more stable for presentation on the Web. Museums should only have provide a declarative document that represents the mapping or transformation of their data. They should never have to write complex code or database-specific routines. Mapping should happen dynamically at request time, and should be invisible to components.

We took at stab at implementing this approach early in Engage 0.1, but I don't think we got the design quite right. Here's a JIRA I filed which outlines the issue:

http://issues.fluidproject.org/browse/ENGAGE-367

As a result of the problematic implementation of fluid.engage.mapModel(), we've had to start cutting corners. In particular, we're doing transformations in the Couch DB views themselves. I've documented this problem here:

http://issues.fluidproject.org/browse/ENGAGE-368

The problem with doing model-related transformations in Couch views is two-fold:

1. New adopters of Engage will have to write their own Couch DB views, which requires code. That's mean.
2. Free-text queries (such as Lucene views) can't reuse this functionality, causing a fundamental mismatch in structure, even for the same type of document.

As a result, Sveto had to further propagate the problem of making assumptions about the structure of data, this time in the My Collection service itself. I've explained the bug here:

http://issues.fluidproject.org/browse/ENGAGE-369

The bottom line is that our code now makes fundamental assumptions about the data being structured in McCord's way ("I'll do it McCord's Way?"). With the Engage 0.3b release coming fast, I don't see any good, expedient resolution to these issues, but they will be fatal in the long run. We're going to have to leave these bugs in the system temporarily, but I want to make sure we're all thinking about these issues and ways to resolve them after the release.

Thanks,

Colin

---
Colin Clark
Technical Lead, Fluid Project
http://fluidproject.org