"HyTime Groves: the progenitor of Topic Maps" (http://www.coolheads.com/SRNPUBS/groveProgenitorOfTMs.txt) Steve Newcomb 24 March 2010 Abstract: This response to a posting in Patrick Durusau's blog explains why gaining an understanding of the intellectual roots of the Topic Maps paradigm increases one's estimate of its scope. NOTE: The blog posting's WWW address is http://tm.durusau.net/?p=44&cpage=1#comment-205 NOTE: An index of writings of Steve Newcomb can be found at http://www.coolheads.com/publications.htm ----------------------------------------------------------------- Patrick's quotation from ISO 13250:2000 moves me to share some reflections that some of his readers may find stimulating. The bulk of the information-interchange power of ISO 13250:2000 Topic Maps was deliberately omitted from the XTM Specification, from which today's Topic Maps Data Model (TMDM) is derived. Among other topic mapping pioneers, including Michel Biezunski, I encouraged and assisted in that specialization. Additional specializations were introduced by well-intentioned implementers of XTM, who had investments to protect, investors to please, and products to move into the marketplace. Today, of all the parts of the current state of ISO 13250, only its Reference Model (Part 5) has a frame of reference that extends outside TMDM's relatively confining perimeter. It's important to me that our thinking not be confined to TMDM, and many of us are still fascinated by the original scope of 13250:2000: facilitating the amalgamation of master indexes of diverse corpora from partial indexes that are ontologically and taxonomically independent of each other. By contrast, to use today's TMDM-oriented tools, one must view all information through the taxonomic lens of TMDM. Gratifying though TMDM's success is, that success, and the scope of TMDM itself, is dwarfed by the scope of what we were attempting to accomplish when we drafted 13250:2000. Patrick's quotation from 13250:2000, and particularly the sentence: "There are no restrictions on the kinds of information that may be referenced by an identity attribute." reminds me how much 13250:2000 depended on normative references to the ISO 17044:1997 (and :1992) "HyTime" standard to convey its intent. All of the kinds of information for which HyTime defined interchange syntaxes are included in that sweeping "no restrictions" statement, and those kinds of information were very prominent in the minds of 13250's drafters and reviewers. (That prominence was no coincidence!) Here's one of the 24 normative references to HyTime in 13250:2000: "The definitions provided in [...] ISO/IEC 10744:1997 (including Amendment 1) shall apply to this International Standard." So, in order to glimpse the intended scope of 13250:2000 with any accuracy, I think one really needs to know a thing or two about HyTime. HyTime pioneered the idea of formally standardizing a way to interchange -- and to exploit in unanticipated contexts -- strategies for positively identifying, and for regarding as exactly the same for all purposes of linking, scheduling, etc., certain classes of subjects of conversation, the classes being: * information components, * information addresses, * abstract extents in n-dimensional finite coordinate spaces, * mappings among such spaces, * semantic-bearing relationships, * namespaces, * and more. In HyTime, all of these subjects are notionally represented not by "topic information items" (or "topic links" in 13250:2000), but instead by nodes in "Graph Representations of Property Values" called "groves". The HyTime "grove" idea establishes a way to endow the components of information objects (etc.) with identities, and with addresses that leverage those identities in whatever way(s) may be desired. Groves enable all information components to be addressed without having first to add metadata to them. For example, in a grove of an SGML document, a given element is addressable regardless of whether it has an ID attribute, and adding an ID attribute to it only makes it addressable in yet another way. And elements are just one kind of addressable information component; in an extreme (and usually absurd) case, a grove could include a node that represents a given whitespace character in an XML start tag. "To build a grove" means "to view the information as a graph of nodes constructed in whatever formal and deterministic ways meet the requirements of whatever the intended applications may be." Everything -- every syntactic and/or semantic component of an instance of SGML or any other notation -- can be endowed with identity and addressability, and therefore it can play a role in any kind of hyperlink. In the grove paradigm, the identity of a component can be defined or addressed in terms of the identity of any other component, or even in terms of the identities of all of the other components and their relationships to it. Or in any other way. In grove-land, you get to choose (and even to design, if you like) how the whole information object will be viewable as a graph. According to HyTime, the way in which you are choosing to view it is formally defined by an interchangeable "Property Set" -- documentation about, and structural constraint specifications on, the view of the information that you have chosen to use. For the most part, a Property Set defines classes of nodes. In a grove that conforms to a given Property Set, each instance of each class represents an instance of some corresponding class of subjects. And in a grove, every subject is either a piece of information, or a semantic derived in a defined manner from one or more subjects that are pieces of addressable information. An example of the latter is a property defined in the "HyTime Property Set" whose value is, in effect, a dictionary of the nodes that are addressed by other nodes in some specified corpus. Groves were the prototypes of Topic Maps, and the Topic Maps idea is no more or less than a generalization of the Grove idea. Every grove node represents a very specific, formally-identified subject of conversation. Indeed, the only differences between HyTime Groves and Topic Maps, as defined in the Topic Maps Reference Model (TMRM) are constraints on groves that are relaxed in topic maps: (1) In HyTime Groves, there are only a few valid property value types, whereas in TMRM, the types of the values of properties is unconstrained. (2) In HyTime Groves, no grove (and no node's properties) can be defined by more than one monolithic Property Set, while in TMRM, a given single node can have instances of properties whose classes were defined independently, with no cooperation or mutual understanding among their definers. (3) A HyTime Grove may or may not be acyclic, but it is always hierarchical in that there is always a root node. There is no such constraint on a Topic Map. A Topic Map may or may not be hierarchical, and it may or may not have a root node. (4) Every node in a HyTime Grove represents a subject which is always some piece of information. In a TMRM Topic Map, there is no such constraint on the subjects that nodes can represent. (5) Every node in a HyTime Grove is an instance of some node class that is defined in the grove's Property Set. By contrast, there are no node classes in the topic map graphs described in TMRM. Or, maybe it would be clearer to say that in TMRM there is only one node class, "subject proxy", and that all subject proxies are instances of it. In effect, of course, there are *subject* classes in Topic Maps, and the class membership(s) of each subject are revealed by the classes and values of the properties of the corresponding topic (aka "subject proxy"). Thus, all HyTime groves are easily seen as TMRM Topic Maps; any remaining differences are merely terminological. HyTime groves consist of nodes, the nodes have properties, the properties are instances of user-defined classes of properties, the property classes are disclosed (the legend is the Property Set), and every node's purpose is to serve as a proxy for a single subject of conversation. With all that in mind, let's return to those words in 13250:2000: "There are no restrictions on the kinds of information that may be referenced by an identity attribute." This was a conscious reference to all of the identity- and addressability-endowment power of the HyTime "grove" paradigm, among all the other possibilities. The intent was to allow Topic Map authors the freedom to decide not only what their subjects are, but also the subject-identity-invoking techniques embodied in the information referenced by the "identity" attributes of s. The referenced information could be, for example, a node in a grove, and thus the entire semantic-loading and subject-sameness apparatus of HyTime Property Sets, Grove Plans, Architectural Forms, Scheduling, Mapping, Activity Policy Tracking, and much more could be brought to bear, using any combination of HyTime modules. The very next paragraph of 13250:2000, immediately after Patrick's quote, says: "NOTE 18 The information referenced by an identity attribute may or may not take the form of a topic link in a topic map document..." Among other things, this note underlines the idea that the referenced subject descriptor's context is important in understanding the identity of the subject being invoked by the reference. If the referent is a topic, then what is being referenced is the *subject represented by the topic* -- something that may not be knowable without understanding the referenced topic's context in its own topic map. This idea is further clarifed later in 13250:2000: "Similarly, if the identity attribute references one or more topic links, topic map processing applications must regard the referencing topic link, and all the referenced topic links, as having one and the same subject, and therefore they may all be merged." But what if the topic map author needed to refer not to the subject of some , but rather to the syntactic SGML element that is that ? That is, what if that particular instance of a element was supposed to be the *subject* of the referring topic? The answer to this question was not explicit in 13250:2000, but it was implicit in the "no restrictions" formula and in the normative references to HyTime. I, among others, assumed that the identity attribute would refer not to the ID of the (because, according to 13250:2000, that would always be a reference to the 's subject, as we have just seen), but instead to that 's corresponding grove node. This works because the subject of the grove node that represents the element is not the subject being represented by the , but rather the element itself, considered as an instance of an SGML syntactic construct. The two referents (a element vs. a grove node whose subject is the same element) have different semantics by virtue of their different contexts. In the context of an SGML grove, the subject of a node is always an instance of an SGML syntactic construct, and it can't be anything else. In the context of a Topic Map, however, the subject of a node can be anything at all, including but not limited to an instance of an SGML syntactic construct. It could be something wildly different from an instance of a syntactic construct, such as Minnie Mouse's high-heeled shoes. You may not be able to tell what the subject of a actually is without looking more deeply at the topic map in which it appears, because the identity of the 's subject may depend on the identities of the subjects of other s in that map, just as the identity of the subject of a grove node may depend on the identities of the rest of the information components that have been node-ified in the grove. Steve Newcomb 24 March 2010