My Collection update

Mon Jan 11 15:08:56 UTC 2010

Resending to the list, because I used gmail by mistake.

Hello Colin,

Thank you for your answers and the code review, the timing is perfect 
because I've finished with this stage of integration and needed a new 
direction.

About a code tour - I'm ready to do one on Thursday (what about during 
the developers' meeting) or Friday.
I'm not sure how much of the changes I have in mind it will be possible 
to implement until the code tour, but I plan to start with the user ids, 
continue with the database structure and then correct the things that 
you have mentioned in the code review.

Concerning the database structure I think I've understood your idea and 
hopefully there will be no more issues with that when I change the 
structure. This will be as simple as eliminating the shadow documents 
and creating a new view that will map artifact ids to collection ids.

Cheers,

Sveto

Colin Clark wrote:
> Sveto,
>
> Huge apologies for the delay in getting back to you with advice and 
> some code review. I've had a chance to take a look at your code, and I 
> think things are coming together nicely.
>
> I've included some comments and suggestions below. I'm wondering if 
> you'd also be willing to give us a code tour some time this week? That 
> will help me better understand your intentions with the code.
>
> On 7-Jan-10, at 12:12 PM, Svetoslav Nedkov wrote:
>> 1. The integration of the user collection with the artifact view is 
>> quite ready, I'm currently having an issue with a selected dom 
>> element that doesn't seem to accept click events when passed to 
>> another component, but I hope that I'll be able to fix that for a 
>> short time tomorrow that's why I won't fill in the details.
>
> Were you able to fix the issue? If not, tell us more. It sounds 
> interesting.
>
>> 2. To provide a better way of testing this, tomorrow I will create a 
>> script that generates empty shadow documents for the artifacts that 
>> are seen in the browse page. This way we will be able to add/remove 
>> all the artifacts that we currently see.
>>
>> 3. Also another concern I have is regarding the data structure we 
>> use. Last talk on the subject we had we settled for a centralized 
>> user database, but I understand that this is not planned and intend 
>> to remove it completely, replacing it with a suitable CouchDB view 
>> that will be used only for getting data. This will eliminate the 
>> problem with redundancy I've mentioned in my previous email.
>>
>> I'd like to hear your opinion on the subject.
>
> I'm afraid I've cause terrible confusion around the issue of shadow 
> databases and how collections should relate to artifacts. This was 
> undoubtedly inspired by some bad code I sketched out while talking 
> about the idea of shadows at the dev meeting back in November, and I'm 
> really sorry for the confusion. Let's see if I can try to clear this up.
>
> Justin is right, the point of shadow documents is to maintain two 
> different "namespaces" for writing data. The first contains data 
> sourced from the museum directly, in its original format. Everything 
> in the database from Engage 0.1 fits within this category, since it's 
> all read-only.
>
> The second document, or shadow, stores any contributions from users 
> that apply to a particular museum-sourced entity. So, for example, if 
> we wanted to add an array of user tags to each artifact, we'd write 
> them to the shadow database instead of modifying the museum-derived 
> document directly. That way we can clearly identify where data 
> originated so it can move freely move back upstream to the museum if 
> needed.
>
> In trying to illustrating this at the dev meeting awhile ago, I 
> incorrectly suggested that pointers to the collection should be 
> located within the artifact itself. That's not necessary, and it's 
> much simpler just to have collection documents refer to artifacts. You 
> had it right the first time.
>
> Circular references are, as you pointed out in your last email, 
> problematic. Having the artifact/collection "relationship" stored in 
> both documents is unnecessary and does raise the sorts of 
> transactional issues that a well-designed Couch database needn't 
> ordinarily be too concerned with.
>
> So, I'd suggest getting rid of any references to collections within 
> artifact documents. That way, you won't even need to maintain a shadow 
> artifact document at all, and you can simply write to the collection 
> document without concern for shadows or mapping from a museum schema. 
> Just write to your collection document and you're done; this should 
> simplify your code a fair bit.
>
> As for your specific question, I agree that we'll probably often have 
> views in Couch that will provide a merged, read-only view of an entity 
> containing data from both the main document and its shadow. We'll also 
> have some infrastructure in our data access layer on the server that 
> takes care of writing to the shadow. It's not something we've worked 
> out yet, but your suggestion of creating shadows on the fly when 
> they're not there sounds like a reasonable approach.
>
> The good news is that so far we don't really have a need for shadow 
> documents, so we can sidestep this complexity. I expect in the future 
> we'll probably have to tackle these issues, but for now we needn't 
> sweat it. Sorry for the confusion.
>
>> 4. I think that the idea to generate a CouchDB unique id for the user 
>> session is a good idea, just to clarify - will we create a document 
>> for the session that can be expanded in the future or for now just 
>> use the functionality that allows us to generate uuids.
>
> Not wanting to risk any ambiguity, I think we should treat these as 
> user IDs, rather than session IDs. They won't correspond to any formal 
> session state on the server-side (we don't have session state), and 
> they are really a way for us to keep track of a particular user. Once 
> the designers have resolved how logins will work, I assume that we'll 
> keep track of user login/password information via these ids as well. 
> So, inspired by how you've designed collection documents in the 
> "users" database, here's how I'm thinking we might represent it all:
>
> {
>   type: "user"
>   _id: <crazy-long-couch-uuid-here>
>   email: <not used at first, but perhaps eventually filled in by the 
> user>
>   collection: {
>     artifacts: [
>       {
>         museumId: "mmi",
>         id: <crazy-long-couch-artifact-id-here>
>     ]
>   }
>
> In effect, it's the same structure that you've laid out, except that 
> the document represents the whole user rather than just the 
> collection. Does this seem like a reasonable approach, or am I missing 
> anything obvious?
>
> So, onto some code review:
>
> * Standalone previewability: Sometimes it's really nice to test a 
> component without needing the server or database running. I couldn't 
> get the MyCollection component to run standalone due to some path 
> problems. I also didn't see any sample data, so you'll probably want 
> to implement that as well. Take a look at the other component or the 
> work Boyan has done with Capture for reference. It's a bit of extra 
> work, but really helpful.
>
> * Minor path issue: when I checked out your code, you've got Infusion 
> in a directory called "infusion," but your paths refer to 
> "fluid-infusion." I renamed the directory and it worked fine. To 
> simplify things, I'd suggest just bringing in Infusion as an external. 
> We still need a better way for non-committers to work on release-level 
> code (branching is all we've got at the moment--wish we were using 
> Git), so it's something we'll try to talk about at the dev meeting 
> next week.
>
> * You mount your myCollection data feed and template inside the 
> "/artifacts" URL space. I'm thinking that since these documents may 
> actually represent users, we should mount them as a top level 
> resource. Here's a sketch for now, and then we can consider a more 
> resource-oriented (rather than view-oriented) approach later:
>    User data feed: http://server.org/users/collection.json
>    MyCollection template: http://server.org/users/collection.html
>
> * I'm not fully clear on what's happening in your render() method in 
> the MyCollection.js component. I'm confused about the block where you 
> call fetchTemplates() around lines 122-133. If you're calling 
> reRender(), you should already have the parsed templates and don't 
> need to fetch the raw HTML template again, right?
>
> * Could some of the code in your component--such as getArtifactIds() 
> and the other get...() functions--be implemented as Couch views or 
> model mapping functions instead?
>
> I noticed that the code in your updateDatabase.js file could use some 
> work. Here are a few issues I noticed:
>
> * There's a fair bit of code duplication here. If you take a look at 
> your getCollection(), getCollectionById(), and getShadowArtifact() 
> functions, they share a fair bit of boilerplate code. It should get 
> simpler without shadow artifacts, but perhaps you can factor some of 
> this code out into a single, reusable function? collection() and 
> uncollect() also share a pattern. As an aside, this sort of data 
> access is now pretty common across all services, so Yura and I are 
> going to dig into some framework code to reduce this code redundancy 
> significantly.
>
> * I think we could be a bit more resource-oriented in our URL design 
> here. Generally, we want mounted handlers to represent a real thing in 
> the system--resources such as artifacts and collections--and then use 
> HTTP methods for operating on those resources. In particular, I wonder 
> if there's a way to implement your collection operations differently. 
> Here's a sketch off the top of my head, but it will need a bit work to 
> think through before implementing:
>
>    http://server.org/users/xyz/collection/artifacts/abc
>      POST adds the artifact identified by the id "abc" to the "xyz" 
> user's personal collection
>      DELETE uncollects the artifact from the user's personal collection
>
> I realize there's an asymmetry between this more resource-oriented 
> style of URL and some of our existing conventions. I'd like to move 
> towards a more resource-oriented way over time, but I realize it make 
> take some new infrastructure in Kettle as well as a bit of design. 
> Another topic for the dev meeting.
>
> Whew, super long email. Hopefully it's not too much to digest and that 
> it's helpful. Don't hesitate to keep up the thread if you have any 
> questions or if there are things I'm missing here. I'm really 
> interested in your ideas, suggestions, and alternative designs for any 
> of these issues, too!
>
> Colin
>
> ---
> Colin Clark
> Technical Lead, Fluid Project
> http://fluidproject.org
>