| Semantic rules inside Tagsistant |
| Tuesday, 19 August 2008 14:49 | ||||||
Tagsistant is very useful in cataloguing files, but all the load of adding tags has to be managed by the user. Even worst, if someday the user will decide that a tag is a subset of a more comprehensive tag, all the files tagged with the first tag should be also tagged with the second one, which is boring and unefficient. Full semantic support implies that tags are related each other with a set of relations which describe their dependences. Current semantic formats are based on OWL which is a complex framework derived from RDF to describe full consistent onthologies. After looking at OWL for a while I'm not able to tell if it's the right thing for Tagsistant. OWL is without doubt the most accepted and best promising standard. But it's also very complex and without an evident demonstration that such level of complexity is required by Tagsistant (which is a small application targeted to work even on PDAs), choosing it can be a strategic error. On the other side, choosing something else, or even reinventing the wheel, can be a possibly worst choice. What should I do? So far, nothing. The first thing to do is to draft some ideas about possible relations that can happen between tags inside Tagsistant, and later choose one or the other. So the very first kinds of relations I can figure out are:
To describe such kind of relations a complex language like OWL can be just a killer requirement, too heavy to implement and at most unuseful. But what about expanding the onthology and inserting a lot more relations? Having embedded OWL in Tagsistant can turn to be the right thing. What do you think about? Send me your opinion by mail. You can find my address on top of this page. Next step: my own wayNo, I don't want to reinvent the wheel, I assure you. But I've decided that a standard is a good thing as long as it's motivated by a evident need. Since Tagsistant will end up storing semantic rules (its onthologies) inside its SQL database (for faster processing and persistence across mount/umount), why should I use a XML format to store such things? I've started coding a Gtk+ based application which will manage onthologies directly inside the SQL database. An XML format is also planned but just for archive exporting. How will be implemented? In a separate table, called (probably) relations, which is formed like:
For example, an equivalence will be like: insert into relations values ('metal', 'equivalence', 'heavymetal');while a inclusion will be: insert into relations values ('music', 'includes', 'jazz');How to use that data? Inside build_querytree, for each tag specified, will be started a recursive search to extract additional values which have to be used as additional search criteria. Each result will be chained inside the ptree_and_node_t structure of that tag. A new field, called related will be added to ptree_and_node_t to store a linked list of related tags. Later, inside build_pathtree each additional tag will be chained in SQL view code adding a or tagname = "%s" to the string. As a consequence if a user browse ~/tags/music and music has been previously selected as a superset of jazz (as in the example "music" "includes" "jazz"), the SQL code that builds the view for this path will become: select filename from tagged where tagname = "music" or tagname = "jazz" Sounds quite easy, don't? |
