One of the world's coolest free software applications is Zope, the web application server cum object database from Digital Creations. The story of Digital Creations, a consulting company for whom open sourcing the family jewels (Zope) and its development process marked the road to financial stability and continued growth, is well documented (see here and here for more on the corporate history). And the success of Zope itself is easily found all over the web, including in messages on varied lists with strangely over-the-top enthusiasts saying things like "oh my goodness, I get it now, I'm never doing this with anything but Zope again!" including maybe one or two from yours truly (btw, if you're never seen or used Zope, check out Paul Browning's Zope - a Swiss Army Knife for the Web? for a great intro for librarians or university types).
But did you know that some of the key folks behind Zope admit, unabashedly, that they've "always wanted to be a librarian"? In fact, when I met Paul Everitt, CEO of Digital Creations, at LinuxWorld last month, that's exactly what he said. A quick tour of Zope, and especially many of its newest features, reveals that the folks behind Zope (Digital Creations staff and the many contributors to Zope's open source Fishbowl Process for development) think very deeply about many of the same information issues fundamental to librarianship... especially metadata, and the creation, management, and leveraging thereof.
Paul and Ken Manheimer, developer on many Zope projects at Digital Creations (and noted as "Mailman's Savior" in the acknowledgements of everybody's favorite list manager, among other accomplishments and contributions), graciously agreed to be interviewed about recent developments in Zope for oss4lib.
[Note: Our interview took place over a few rounds of mail exchanges/responses, so some reordering and very minor rewording of a few of the questions has been done for flow. None of the text of responses has been altered, and Paul and Ken both reviewed the text before it was posted. Any mistakes or misrepresentations this refactoring might have caused are the author's fault.]
oss4lib: You and Ken both confess to being "closet librarians." Tell us about that. (It's okay to come out of the closet on oss4lib. We understand, we've all been through it. :)
Paul Everitt: Well, I'd put it like this. We both think about information and content in abstract, rich ways. We then obsess over valuable ways to leverage the relationships and organizations within content.
Ken and I have been fond of saying lately that metadata and the relationships in the "space in between" content is itself content.
In Zope, everything is a dynamic object in the database, including metadata and indices. Since content is just marketspeak for object, we actually have a platform to do something about these ideas.
Ken Manheimer: Yes! Computers enable profound increases in the scale of communications, over space and time. Without sufficient measures for organizing all this stuff being cast around, we inundate people as much - or more - than we help them get the answers they're seeking, when they need them. Establishing the "spaces between" the information - the context, the relationships - as explicit content, the system can take the context into account, and we can develop strategies and mechanisms that help fit answers into context. We can help fit collections of answers into stories.
oss4lib: Traditionally, libraries gather metadata and information about content relationships in reference sources (in holdings catalogs as well as acquired encyclopedias, directories, indexes, atlases, union catalogs, etc.), enshrine these in a reference section, and put reference librarians in the "space between" this section and library visitors. Your approach, very cognizant of this traditional model, creates and organizes as much of this information as possible automatically, and allows users to fill in many of the gaps themselves if they choose to behave certain ways.
Manheimer: I think that's the gist of it. The idea is to organize the process and interface so that whatever metadata can naturally fall in place does fall in place. (Note that this is different from doing complex inferencing, eg ai or iterative collaborative filtering. Using collaborative feedback will have a big place, I think, but I shy from elaborate inferencing, at least until we've milked the overt stuff that can be gathered in process...)
oss4lib: In the recent Zope Directions Roadmap, you're making another push to simplify how people approach Zope (e.g. moving away from DTML), and to target your audience further (e.g. restated focus on developers). There must be a difficult balance to maintain when making things better and easier runs the risk of introducing change for some of Zope's most loyal fans. This issue faces libraries every day, especially with longtime users unfamiliar with new interfaces. We usually struggle with such decisions and end up trying to please everybody. How do you make these decisions?
Everitt: In the "fishbowl". That is, we make fairly formal proposals and put them up for a review process. We've struck a nice balance between the power of global collaboration and the coherence of benevolent dictatorship.
People have found the Zope learning curve too steep. In retracing the cause of this problem, we found that Zope tried to be all things to all people. Not knowing your primary audience leads to usability prolems. Thus, order number one was to pick an audience for Zope, and then allow Zope extensions to go after other audiences.
This leads to the Zope Directions document you mentioned. Zope is for developers who create useful things for other audiences.
Note that our idea of developer, thanks to Python, differs dramatically from other systems which use low-level systems programming languages like C++ or Java for extension.
oss4lib: Clearly, you pay close attention to what your consulting customers want from you, and to what active members of Zope's open source development want from Zope. As each community grows, how do you manage to keep listening closely to both? In what ways do the ideas you hear overlap?
Everitt: There's a value cycle in our strategy. We tell customers, "We have this Open Source platform with great value being added from developers and companies worldwide that you can tap into." We then have to execute on having a strong, attractive platform for developers to create interesting things like Squishdot, Metapublisher, etc.
Then we turn it back around and explain back to the community how customer engagements are driving things that are clearly important to the platform's viability, such as enterprise scale.
It's worked out very well, although there are times where the choice has to be made, and this almost always means the consulting customer wins. As we've learned from these situations, we've adapted our organizational model to better leverage the synergy. How we're now structuring ourselves is becoming as exciting as the software itself.
oss4lib: The new Content Management Framework (CMF) should appeal to many libraries, especially those wanting to empower patrons to manage their own content and allow customized content views. One of the most interesting things about the CMF is its deep support for Dublin Core (DC), with every object supporting DC descriptions. What led you in this direction?
Everitt: Believe it or not, Mozilla!
I've been doing this information resource and discovery thing for a while, with Harvest in 1993 and CNIDR and the like. I had followed Dublin Core for a while, plus related initiatives such as IAFA.
However, Mozilla was the first time I had seen DC built into a platform. Being tied to RDF nearly made it out of reach for people. But the value of having every object or resource in Mozilla support a standard set of properties was apparent, even for a knucklehead like me. :^)
I'm surprised Dublin Core hasn't become universal amongst CMS vendors. Nah, I take it back, I'm not surprised. :^)
oss4lib: A common frustration with Dublin Core is that it would be all the more powerful in the aggregate if more applications and sites implemented it.
Everitt: Alas indeed! But it's not hard to see why it hasn't taken off. It's hard to get authors to participate in metadata. And when there's nearly no payoff or visible benefit, the incentive is even lower.
RDF has suffered from this same chicken-and-egg problem. It's needed a killer app that simultaneously sparks both supply and demand.
oss4lib: Seeing DC in the CMF gives us hope. :) A likely upside is that if more applications and sites use DC, everyone will clamor for more robust metadata. In what ways are you planning for that next level?
Everitt: I believe Ken would agree that the next area of interest for us over the next six months is the "space in between" content.
Manheimer: Yes! There's a lot of metadata that can be inferred on the basis of process and content.
For instance, we can identify the "lineage" of a document according to the document from which it was created. We can harvest the actions of visitors, like site-bookmarking, commenting, and rating documents, to glean orientation info for subsequent visitors. We can infer key concepts from the content, eg, common names (in the wiki, WikiNames).
Overall, we can reduce the burden on the content author and editors to fill in metadata when it can be inferred from process and content.
Everitt: To illustrate, I'll go back to one of the eureka moments that Ken and I had several months ago. We've been pretty big consumers and contributers to the Wiki movement, which on the surface is the unapologetic antithesis of librarianship. That is, Wiki really tries to say, "We'll lower the bar so far, you'll always jump over it."
At one point, though, I became concerned that we were building an alternative CMS with our WikiNG efforts, so Ken and I sat down and tried to plan ways to converge Wiki and CMF. We listed the things we liked about Wiki, what were the real innovations, and discussed ways to converge these innovations into the CMF.
We found out that one of the most attractive areas of Wiki was the way it assembled relationships and meaning from a corpus of slightly-structured information. For instance, Wikiwords (the automatic hyperlinks generated from CamelCase words) not only give a system for regular web hyperlinks, they also give a system for the reverse (what pages are pointed to by this page).
In fact, the Wikiword system is a self-generating glossary that distills out important concepts in a corpus. And in Zope, these Wikiwords could become objects themselves. That is, they could become content.
This applied equally to the "backlinks" idea (or lineage) that Ken added to our Wiki software. [Manheimer: A small correction - lineage is actually different than "backlinks", the latter are common to all wikis. Read on.] If you edit Page A and put in a Wikiword that leads to the creation of Page B, then you have a relationship: Page A -> Page B. If you then edit Page B to create Page C: Page A -> Page B -> Page C. The backlink information itself could become content, thanks to the relationships.
Manheimer: The idea is proposed here: zwiki.org/WikiStructuringIdeas. You can see it in action in the wikis on zope.org and on joyful.com, the home of zwiki. See dev.zope.org/Wikis/DevSite/Projects/WikiForNow/RegulatingYourPages for one interesting exploitation of lineage, and see its parent, WikiForNowDevelopments (linked in the header bar lineage display), for an overview of our wiki development features...
Everitt: Neither the Wikiword nor the lineage are part of the content. They exist in between the content. But they are as powerful as the content, and in fact, they can be treated with the some of the same services you would apply to content in a CMS.
oss4lib: At Yale we used Wiki very successfully for documenting several project discussions, but we also experienced many of the common problems with wiki (e.g. who wrote what, how do you track changes, how do you preserve ideas, etc.). What are some other important improvements we should look for from the WikiForNow effort, and what else should we look for from WikiNG?
Everitt: WikiForNow, thanks almost exclusively to Ken's perserverance, illustrates how a smarter system can address the common problems. Without, hopefully, throwing the baby out with the bathwater. Each of the three things that you mentioned are in WikiForNow.
However, they get there in WikiForNow by tapping into infrastructure that is shared amongst all content in Zope or in the CMF. For me, WikiNG is more about devolution rather than evolution. That is, take the zen of Wiki and the features of Wiki and make them pervasive beyond Wiki. That means that all content gains Wiki Zen.
Manheimer: Classic wiki shows many things well worth doing - eg, WikiWord vocabulary, backlinks, recent changes, etc. It also manifests an outstanding *way* of doing them - low impedence/low complication operation and authoring - that we will only be able to achieve in the general realm if we use a smart, discerning framework. From my viewpoint, having recently joined the CMF effort, i think it is becoming just such a framework. I think we will be able to generalize the classic wiki features, and our organizational strategies/extensions, more globally and across a richer, comprehensive range of content. I'm excited about it!
Everitt: It remains to be seen whether this model can achieve its goal without losing the simplicity that makes Wiki so pervasive. But let me describe a thought scenario and see if it makes sense to you...
Email and news. Lots of content blows by, a continuous flow of wisdom left almost completely untapped. Email isn't content.
However, let's say that some smart mailing list management software (such as Mailman) did a little bit more than relay mail and archive a copy on disk. Let's say it also shoved a copy of the message into a content management system, which converted relevant RFC822 headers into Dublin Core, indexed the contents, etc. Just for fun, let's call that CMS, well, the CMF. :^)
So in real time people could do full-text and parametized searches. Big deal.
However, let's say the CMF also applied some of the ideas above to email. For instance, threading relationships in email could translate to backlinks/lineage, from which you could make inferences.
But let's take it a *huge* step further. Let's say that a small portion, perhaps 1%, of the people on the mailing list committed themselves to being Knowledge Base Contributors. That is, before sending their email, they observed a couple of small conventions:
a. Using RFC822-style headers at the top, as is done in CMF's structured text, they add targetted cataloging data.
b. In the text of their message they use Wikiwords.
For instance, an email message in response to a bug report might look like this:
""" Subject: apache, mod_proxy, timeout, bug report Description: A bug report on using Zope behind Apache 2.0a9 and its new mod_proxy code causes the previously-reported ZopeProxyDownstreamError. Rating: useful I heard Ken describing the other day at his ZopeProxyDownstreamError page that others had experienced this error. There's a fix available at the ZopeHotFixes page. Joe User wrote: > Hello Zope mailing list. I am using Zope behind the latest Apache 2 > alpha and am getting proxy server errors. Has anyone else seen this? """
Observe that a Wikiword was used in the headers as well as the body. There was also an extra header (rating) that isn't part of Dublin Core.
So all we ask is that a very small percentage of people use this system, and the smart mailing list server will munch the headers at the top of the email message before relaying them. Not a very high bar to jump over.
Thanks to the threading response relationships, nearly every email message in the corpus will be within one or two relations from something manually annotated.
You could then provide tools that treated the relations and the concepts as content, allowing reparenting and cleaning up the vocabulary.
oss4lib: Implicit in this is knowing that in a given community a small subset of folks will self-select into a group of detail-obsessives working to help the others find and manage context-relevant information. In libraries, it's the catalogers; in the general public it's folks adding to IMDB or moderating MusicBrainz; in the hacker community, it's folks writing How-Tos, guiding free software projects, moderating slashdot, and so on.
Manheimer: Truly, the power of our species is collaboration. That's why computer communications are making so many fundamental waves - they enable quantum leaps in collaboration scale, immediacy, and intricacy. We're all only gradually learning to harness that potential. I think the librarian sensibilities are key because they're about systematizing the advancements so they scale...
oss4lib: A wiki "feature" that made a few librarian colleagues cringe was that its mutable, dynamic nature was the only possible state. It was agreed by all that a great function would be to enable offloading a static snapshot of a wiki as a set of properly hyperlinked html pages. Is it possible now to preserve a Wiki this way?
Everitt: Sure, if that's what people want. wget will easily snarf down a copy of a site.
But that's only one solution to the problem. A better solution is a better system, one in which access control is possible and access to previous versions (history) is there.
Manheimer: RegulatingYourPages, mentioned above, details this.
oss4lib: It sounds like the future of Wiki overlaps in many ways with your plans for robust metadata support.
Everitt: As hinted above, we already have RFC822-style headers in Wiki. For instance, if you edit a Wiki page from an FTP client like Emacs/ange-ftp, you'll see the Wiki seatbelt inserted at the top of the page.
More important, the CMF will converge with our Wiki efforts. A Wiki page will have all the web-based and text-based authoring benefits and metadata of a CMF document. And hopefully, a CMF document will have all the sublime interconnecting that you see in Wiki.
oss4lib: The no-longer-active connection with Fourthought and 4Suite was very promising in this area; while we can always build tools that use 4Suite's 4RDF in conjunction with Zope, a few of us were hoping for deeper integration. What's the outlook for general support for RDF in Zope?
Everitt: Hmm, good question. We have an ongoing dialog with Rael Dornfest from O'Reilly. I'd be interested as well in some of your thoughts on the subject.
oss4lib: Btw, we found some old postings from you at your old .mil address, in the context of GILS. So it's clear that you've been thinking about metadata on the web for a long time.
[Everitt: Wow, you librarians have a long memory. :^) I'm embarassed to think what I said back then. Well, you know, I was younger then, it was a crazy time, I didn't know it was a felony...oh, wrong embarassment. :^)]
What's your assessment, in early 2001, of how much we've progressed overall in the metadata area? What are the most important priorities, and is there a holy grail?
Everitt: I don't think we've made an inch of progress in the mainstream, meaning outside of the library science displine of the already converted.
Nearly everyone I meet (besides programmers) uses Word. They use Word even when they shouldn't. But almost none of them know that File->Properties exists.
Metadata, through <meta> tags, has been built into the Web since at least HTML2. So what percentage of web pages have anything other than the "generator" metatag spewed automatically and unknowingly by FrontPage? Essentially zero.
I have hope for incremental breakthroughs like the CMF, which brings a "Wow, that's the way _everything_ should be done" response to CMS cynics. However, it's still trying to transform a mature, continuous market.
I think this takes a discontinuity, a disruptive breakthrough. Lately Rael has been talking about doing P2P for syndication. P2P could be the kind of transformative breakthrough for DC and RDF. Without a standard vocabularly across verticals (music, etc.), P2P will be another thousand islands, which dramatically lowers the utility. Unlike web pages, which generally wants content to be broadcast and rendered, P2P wants to content to be exchanged. This model demands interoperable content.
oss4lib: What more can librarians do to contribute our experience and insight to the broader software community regarding metadata issues?
Everitt: Uhh, prevent knuckleheads like me from repeating historical mistakes. It's doubtful that a disruptive technology for metadata will come out of the ranks of librarians. However, if librarians keep an open mind and don't fall prey to sacrificing the larger victory by clinging to a narrow agenda, then they can spot a winner and help guide it to adulthood.
Many thanks go to Paul and Ken for their willingness, patience, and responsiveness during the interview process.