My Collection update
Colin Clark
colinbdclark at gmail.com
Mon Jan 11 04:01:32 UTC 2010
Sveto,
Huge apologies for the delay in getting back to you with advice and
some code review. I've had a chance to take a look at your code, and I
think things are coming together nicely.
I've included some comments and suggestions below. I'm wondering if
you'd also be willing to give us a code tour some time this week? That
will help me better understand your intentions with the code.
On 7-Jan-10, at 12:12 PM, Svetoslav Nedkov wrote:
> 1. The integration of the user collection with the artifact view is
> quite ready, I'm currently having an issue with a selected dom
> element that doesn't seem to accept click events when passed to
> another component, but I hope that I'll be able to fix that for a
> short time tomorrow that's why I won't fill in the details.
Were you able to fix the issue? If not, tell us more. It sounds
interesting.
> 2. To provide a better way of testing this, tomorrow I will create a
> script that generates empty shadow documents for the artifacts that
> are seen in the browse page. This way we will be able to add/remove
> all the artifacts that we currently see.
>
> 3. Also another concern I have is regarding the data structure we
> use. Last talk on the subject we had we settled for a centralized
> user database, but I understand that this is not planned and intend
> to remove it completely, replacing it with a suitable CouchDB view
> that will be used only for getting data. This will eliminate the
> problem with redundancy I've mentioned in my previous email.
>
> I'd like to hear your opinion on the subject.
I'm afraid I've cause terrible confusion around the issue of shadow
databases and how collections should relate to artifacts. This was
undoubtedly inspired by some bad code I sketched out while talking
about the idea of shadows at the dev meeting back in November, and I'm
really sorry for the confusion. Let's see if I can try to clear this up.
Justin is right, the point of shadow documents is to maintain two
different "namespaces" for writing data. The first contains data
sourced from the museum directly, in its original format. Everything
in the database from Engage 0.1 fits within this category, since it's
all read-only.
The second document, or shadow, stores any contributions from users
that apply to a particular museum-sourced entity. So, for example, if
we wanted to add an array of user tags to each artifact, we'd write
them to the shadow database instead of modifying the museum-derived
document directly. That way we can clearly identify where data
originated so it can move freely move back upstream to the museum if
needed.
In trying to illustrating this at the dev meeting awhile ago, I
incorrectly suggested that pointers to the collection should be
located within the artifact itself. That's not necessary, and it's
much simpler just to have collection documents refer to artifacts. You
had it right the first time.
Circular references are, as you pointed out in your last email,
problematic. Having the artifact/collection "relationship" stored in
both documents is unnecessary and does raise the sorts of
transactional issues that a well-designed Couch database needn't
ordinarily be too concerned with.
So, I'd suggest getting rid of any references to collections within
artifact documents. That way, you won't even need to maintain a shadow
artifact document at all, and you can simply write to the collection
document without concern for shadows or mapping from a museum schema.
Just write to your collection document and you're done; this should
simplify your code a fair bit.
As for your specific question, I agree that we'll probably often have
views in Couch that will provide a merged, read-only view of an entity
containing data from both the main document and its shadow. We'll also
have some infrastructure in our data access layer on the server that
takes care of writing to the shadow. It's not something we've worked
out yet, but your suggestion of creating shadows on the fly when
they're not there sounds like a reasonable approach.
The good news is that so far we don't really have a need for shadow
documents, so we can sidestep this complexity. I expect in the future
we'll probably have to tackle these issues, but for now we needn't
sweat it. Sorry for the confusion.
> 4. I think that the idea to generate a CouchDB unique id for the
> user session is a good idea, just to clarify - will we create a
> document for the session that can be expanded in the future or for
> now just use the functionality that allows us to generate uuids.
Not wanting to risk any ambiguity, I think we should treat these as
user IDs, rather than session IDs. They won't correspond to any formal
session state on the server-side (we don't have session state), and
they are really a way for us to keep track of a particular user. Once
the designers have resolved how logins will work, I assume that we'll
keep track of user login/password information via these ids as well.
So, inspired by how you've designed collection documents in the
"users" database, here's how I'm thinking we might represent it all:
{
type: "user"
_id: <crazy-long-couch-uuid-here>
email: <not used at first, but perhaps eventually filled in by the
user>
collection: {
artifacts: [
{
museumId: "mmi",
id: <crazy-long-couch-artifact-id-here>
]
}
In effect, it's the same structure that you've laid out, except that
the document represents the whole user rather than just the
collection. Does this seem like a reasonable approach, or am I missing
anything obvious?
So, onto some code review:
* Standalone previewability: Sometimes it's really nice to test a
component without needing the server or database running. I couldn't
get the MyCollection component to run standalone due to some path
problems. I also didn't see any sample data, so you'll probably want
to implement that as well. Take a look at the other component or the
work Boyan has done with Capture for reference. It's a bit of extra
work, but really helpful.
* Minor path issue: when I checked out your code, you've got Infusion
in a directory called "infusion," but your paths refer to "fluid-
infusion." I renamed the directory and it worked fine. To simplify
things, I'd suggest just bringing in Infusion as an external. We still
need a better way for non-committers to work on release-level code
(branching is all we've got at the moment--wish we were using Git), so
it's something we'll try to talk about at the dev meeting next week.
* You mount your myCollection data feed and template inside the "/
artifacts" URL space. I'm thinking that since these documents may
actually represent users, we should mount them as a top level
resource. Here's a sketch for now, and then we can consider a more
resource-oriented (rather than view-oriented) approach later:
User data feed: http://server.org/users/collection.json
MyCollection template: http://server.org/users/collection.html
* I'm not fully clear on what's happening in your render() method in
the MyCollection.js component. I'm confused about the block where you
call fetchTemplates() around lines 122-133. If you're calling
reRender(), you should already have the parsed templates and don't
need to fetch the raw HTML template again, right?
* Could some of the code in your component--such as getArtifactIds()
and the other get...() functions--be implemented as Couch views or
model mapping functions instead?
I noticed that the code in your updateDatabase.js file could use some
work. Here are a few issues I noticed:
* There's a fair bit of code duplication here. If you take a look at
your getCollection(), getCollectionById(), and getShadowArtifact()
functions, they share a fair bit of boilerplate code. It should get
simpler without shadow artifacts, but perhaps you can factor some of
this code out into a single, reusable function? collection() and
uncollect() also share a pattern. As an aside, this sort of data
access is now pretty common across all services, so Yura and I are
going to dig into some framework code to reduce this code redundancy
significantly.
* I think we could be a bit more resource-oriented in our URL design
here. Generally, we want mounted handlers to represent a real thing in
the system--resources such as artifacts and collections--and then use
HTTP methods for operating on those resources. In particular, I wonder
if there's a way to implement your collection operations differently.
Here's a sketch off the top of my head, but it will need a bit work to
think through before implementing:
http://server.org/users/xyz/collection/artifacts/abc
POST adds the artifact identified by the id "abc" to the "xyz"
user's personal collection
DELETE uncollects the artifact from the user's personal collection
I realize there's an asymmetry between this more resource-oriented
style of URL and some of our existing conventions. I'd like to move
towards a more resource-oriented way over time, but I realize it make
take some new infrastructure in Kettle as well as a bit of design.
Another topic for the dev meeting.
Whew, super long email. Hopefully it's not too much to digest and that
it's helpful. Don't hesitate to keep up the thread if you have any
questions or if there are things I'm missing here. I'm really
interested in your ideas, suggestions, and alternative designs for any
of these issues, too!
Colin
---
Colin Clark
Technical Lead, Fluid Project
http://fluidproject.org
More information about the fluid-work
mailing list