Notes from the development meeting

Thu Dec 3 20:18:16 UTC 2009

Hi everyone,

I took some rough notes at today's meeting. I hope others who were at  
the meeting will add to and correct this.

Michelle

Kettle

In order to use Kettle for Decapod there are two things that need to  
be done. We require the ability to make system calls and we also  
require Kettle to be untangled from Engage.

Our requirement for system calls come out of needing to start command  
line processes, determine the status of a command line process that  
we've started and stop processes. We may also need to queue long  
running processes rather then run them all in parallel.

Two possible candidates that have been mentioned are Quartz  
http://www.quartz-scheduler.org/ and libexpect  
http://www.tcl.tk/man/expect5.31/libexpect.3.html

One issue we talked about at the meeting was that we need to think  
carefully about what we can take with us when we move from rhino to V8.
We likely don't need something at the level of quartz or libexpect  
since our needs are so simple.

The best candidate for porting from rhino to V8 is something written  
in javascript. The suggestion was made that perhaps we should write  
our own despite the maintenance costs since we have such simple needs.

The current plan for Decapod's 0.1 is to write our own extremely  
simplistic system call support - just enough to port Decapod from  
Cherry Py to Kettle. This allows us to defer the decision about a  
third party package. We will also continue to look for alternatives.

As far as splitting out Kettle and Engage, I sent a detailed message  
to the list about what needs to be done and the goals for the work.  
The one issue that I hadn't mentioned was what to do with Kettle  
dependencies. We decided that the most reasonable approach was to have  
two configuration files - one for Kettle and one for the application.  
The Kettle configuration would specify all of Kettle's dependencies  
and the application config would specify all of its dependencies.  
Kettle would compare the two to ensure we are not loading dependencies  
twice.

Date Picker

The collection space work is currently in need of a date picker. Yura  
has been looking at the jQuery date picker but unfortunately has hit  
some accessibility issues with it. The biggest problem seems to be  
keyboard behaviour - it is impossible to determine what you are  
changing via the keyboard until the change has been done.

Yura is going to look at other date pickers such as YUI's and  
Google's. He's also going to talk to Erin about what the long term  
requirements of date selection are. One issue to keep in mind is  
support for fuzzy dates such as 'circa 1900' and 'paleolithic era'.  
There is also a concern about the actual format the date is stored in.

Databases

We are currently working on two different levels when it comes to data  
access. We are planning architecture and brainstorming about how we  
handle and organize data in a database and in data feeds and how that  
will be internalized in the framework. At the same time we are working  
on a practical level - we need to get data out of and into CouchDB.

We are starting to see some short term implementations being built and  
we are continuing to think about the long term plan.

In our demo we have data from MMI and McCord and we have couch set up  
to have each museum's data in their own database. Sveto followed that  
lead and made users in their own database too. This work speaks to the  
future ability to federate users across different museum systems.

One thing we need to keep in mind is that there may be groups who want  
to collaborate but who don't want their data to be housed on the same  
server. We also need to be careful not to think of Couch as just a  
database. It's also a set of data feeds. It's on the levels of the  
feeds and APIs that we need to think about connections between data.

Going forward it seems that we should consider having all the data in  
a particular Engage instance live in the same database in CouchDB.

One issue we talked about was how we would distinguish data that came  
to us from a museum source from data that we collected.

One of the advantages of a schema-less database is that it enables  
bi-directional flow of data. Our data needs to go back up stream.

We will likely have 'shadow documents' in the system. For example an  
artifact, like the Spock Decanter, wouldn't have a single document in  
the database. There would be at least two - one from the museum and  
one containing data generated from Engage. In fact, there will be  
museums who give us access to several sources of data so there is a  
possibility of a single artifact having several documents whose data  
will be merged before being shown to the user. We can use views in  
couch to combine the data or perhaps some other implementation.

Concretely, thinking about the data that we currently have, a  
collection would be a document and an artifact would also be a  
document which would contain an array of comments. A user would also  
be a separate document. Here is a rough sketch of what a collection  
may look like. Note that there are 3 documents represented here whose  
data would be used when rendering a collection.

michellesCollection = {
   id: 12345,
   name: "Michelle's collection",
   user: "michelle at dsouza.org",
   comments: ["This collection rules!", "Me too!"],
    artifacts: [6789, 1234]
}

artifact = {
   id: 6789,
   name: "Left handed screwdriver",
}

shadowArtifact = {
   id: 6788,
   inCollections: [12345, 657],
   comments: ["This is not very different from a right handed screwdriver"]
}