My Collection update

Mon Jan 11 04:01:32 UTC 2010

Sveto,

Huge apologies for the delay in getting back to you with advice and  
some code review. I've had a chance to take a look at your code, and I  
think things are coming together nicely.

I've included some comments and suggestions below. I'm wondering if  
you'd also be willing to give us a code tour some time this week? That  
will help me better understand your intentions with the code.

On 7-Jan-10, at 12:12 PM, Svetoslav Nedkov wrote:
> 1. The integration of the user collection with the artifact view is  
> quite ready, I'm currently having an issue with a selected dom  
> element that doesn't seem to accept click events when passed to  
> another component, but I hope that I'll be able to fix that for a  
> short time tomorrow that's why I won't fill in the details.

Were you able to fix the issue? If not, tell us more. It sounds  
interesting.

> 2. To provide a better way of testing this, tomorrow I will create a  
> script that generates empty shadow documents for the artifacts that  
> are seen in the browse page. This way we will be able to add/remove  
> all the artifacts that we currently see.
>
> 3. Also another concern I have is regarding the data structure we  
> use. Last talk on the subject we had we settled for a centralized  
> user database, but I understand that this is not planned and intend  
> to remove it completely, replacing it with a suitable CouchDB view  
> that will be used only for getting data. This will eliminate the  
> problem with redundancy I've mentioned in my previous email.
>
> I'd like to hear your opinion on the subject.

I'm afraid I've cause terrible confusion around the issue of shadow  
databases and how collections should relate to artifacts. This was  
undoubtedly inspired by some bad code I sketched out while talking  
about the idea of shadows at the dev meeting back in November, and I'm  
really sorry for the confusion. Let's see if I can try to clear this up.

Justin is right, the point of shadow documents is to maintain two  
different "namespaces" for writing data. The first contains data  
sourced from the museum directly, in its original format. Everything  
in the database from Engage 0.1 fits within this category, since it's  
all read-only.

The second document, or shadow, stores any contributions from users  
that apply to a particular museum-sourced entity. So, for example, if  
we wanted to add an array of user tags to each artifact, we'd write  
them to the shadow database instead of modifying the museum-derived  
document directly. That way we can clearly identify where data  
originated so it can move freely move back upstream to the museum if  
needed.

In trying to illustrating this at the dev meeting awhile ago, I  
incorrectly suggested that pointers to the collection should be  
located within the artifact itself. That's not necessary, and it's  
much simpler just to have collection documents refer to artifacts. You  
had it right the first time.

Circular references are, as you pointed out in your last email,  
problematic. Having the artifact/collection "relationship" stored in  
both documents is unnecessary and does raise the sorts of  
transactional issues that a well-designed Couch database needn't  
ordinarily be too concerned with.

So, I'd suggest getting rid of any references to collections within  
artifact documents. That way, you won't even need to maintain a shadow  
artifact document at all, and you can simply write to the collection  
document without concern for shadows or mapping from a museum schema.  
Just write to your collection document and you're done; this should  
simplify your code a fair bit.

As for your specific question, I agree that we'll probably often have  
views in Couch that will provide a merged, read-only view of an entity  
containing data from both the main document and its shadow. We'll also  
have some infrastructure in our data access layer on the server that  
takes care of writing to the shadow. It's not something we've worked  
out yet, but your suggestion of creating shadows on the fly when  
they're not there sounds like a reasonable approach.

The good news is that so far we don't really have a need for shadow  
documents, so we can sidestep this complexity. I expect in the future  
we'll probably have to tackle these issues, but for now we needn't  
sweat it. Sorry for the confusion.

> 4. I think that the idea to generate a CouchDB unique id for the  
> user session is a good idea, just to clarify - will we create a  
> document for the session that can be expanded in the future or for  
> now just use the functionality that allows us to generate uuids.

Not wanting to risk any ambiguity, I think we should treat these as  
user IDs, rather than session IDs. They won't correspond to any formal  
session state on the server-side (we don't have session state), and  
they are really a way for us to keep track of a particular user. Once  
the designers have resolved how logins will work, I assume that we'll  
keep track of user login/password information via these ids as well.  
So, inspired by how you've designed collection documents in the  
"users" database, here's how I'm thinking we might represent it all:

{
   type: "user"
   _id: <crazy-long-couch-uuid-here>
   email: <not used at first, but perhaps eventually filled in by the  
user>
   collection: {
     artifacts: [
       {
         museumId: "mmi",
         id: <crazy-long-couch-artifact-id-here>
     ]
   }

In effect, it's the same structure that you've laid out, except that  
the document represents the whole user rather than just the  
collection. Does this seem like a reasonable approach, or am I missing  
anything obvious?

So, onto some code review:

* Standalone previewability: Sometimes it's really nice to test a  
component without needing the server or database running. I couldn't  
get the MyCollection component to run standalone due to some path  
problems. I also didn't see any sample data, so you'll probably want  
to implement that as well. Take a look at the other component or the  
work Boyan has done with Capture for reference. It's a bit of extra  
work, but really helpful.

* Minor path issue: when I checked out your code, you've got Infusion  
in a directory called "infusion," but your paths refer to "fluid- 
infusion." I renamed the directory and it worked fine. To simplify  
things, I'd suggest just bringing in Infusion as an external. We still  
need a better way for non-committers to work on release-level code  
(branching is all we've got at the moment--wish we were using Git), so  
it's something we'll try to talk about at the dev meeting next week.

* You mount your myCollection data feed and template inside the "/ 
artifacts" URL space. I'm thinking that since these documents may  
actually represent users, we should mount them as a top level  
resource. Here's a sketch for now, and then we can consider a more  
resource-oriented (rather than view-oriented) approach later:
    User data feed: http://server.org/users/collection.json
    MyCollection template: http://server.org/users/collection.html

* I'm not fully clear on what's happening in your render() method in  
the MyCollection.js component. I'm confused about the block where you  
call fetchTemplates() around lines 122-133. If you're calling  
reRender(), you should already have the parsed templates and don't  
need to fetch the raw HTML template again, right?

* Could some of the code in your component--such as getArtifactIds()  
and the other get...() functions--be implemented as Couch views or  
model mapping functions instead?

I noticed that the code in your updateDatabase.js file could use some  
work. Here are a few issues I noticed:

* There's a fair bit of code duplication here. If you take a look at  
your getCollection(), getCollectionById(), and getShadowArtifact()  
functions, they share a fair bit of boilerplate code. It should get  
simpler without shadow artifacts, but perhaps you can factor some of  
this code out into a single, reusable function? collection() and  
uncollect() also share a pattern. As an aside, this sort of data  
access is now pretty common across all services, so Yura and I are  
going to dig into some framework code to reduce this code redundancy  
significantly.

* I think we could be a bit more resource-oriented in our URL design  
here. Generally, we want mounted handlers to represent a real thing in  
the system--resources such as artifacts and collections--and then use  
HTTP methods for operating on those resources. In particular, I wonder  
if there's a way to implement your collection operations differently.  
Here's a sketch off the top of my head, but it will need a bit work to  
think through before implementing:

    http://server.org/users/xyz/collection/artifacts/abc
      POST adds the artifact identified by the id "abc" to the "xyz"  
user's personal collection
      DELETE uncollects the artifact from the user's personal collection

I realize there's an asymmetry between this more resource-oriented  
style of URL and some of our existing conventions. I'd like to move  
towards a more resource-oriented way over time, but I realize it make  
take some new infrastructure in Kettle as well as a bit of design.  
Another topic for the dev meeting.

Whew, super long email. Hopefully it's not too much to digest and that  
it's helpful. Don't hesitate to keep up the thread if you have any  
questions or if there are things I'm missing here. I'm really  
interested in your ideas, suggestions, and alternative designs for any  
of these issues, too!

Colin

---
Colin Clark
Technical Lead, Fluid Project
http://fluidproject.org