Blog Post

Developing Standards for Semantic Search and Academia

A few months ago, I wrote a post about the Semantic Web – essentially a brief overview of the movement to improve how data is structured online. Developing and adopting these standards could create powerful new possibilities for using and sharing data. However, it is still an emerging, often confusing, and heavily fragmented field. I decided to start exploring this topic with a simple introduction.

Ruby Sinreich commented on the post, asking what I thought about the use of open formats in academia. Although I have been involved in some academic research as an undergraduate student, most of my experience is in (industrial and consumer) market/marketing research.

This led me to consider how the differences between “academia” and “industry” may influence the development and adoption of semantic standards. These perceived “differences” are generalizations, of course, but a main unresolved issue with these standards is adoption.

“Wouldn’t it be so much easier if everyone agreed upon the same format?” seems to be a common question and, in my mind, one that is missing the point. Different standards can be good, especially for different purposes – sharing biomedical data, humanities research, internet search results, and so forth.

What happens when standards are “competing” for the same purpose? How will this be resolved, if at all? When developing standards, some voices will naturally be heard louder than others will, so it will be interesting to see how things take shape as we move forward.

Semantic Standards and Open Formats in Academia

I feel that greater use of open formats could benefit academia a great deal. From what I understand about the topic, I think I’d agree with this piece I read by David Weinberger (via a post by Ruby Sinreich).

The fact that standards are being adopted at all seems like a very good thing in and of itself. That they are developing to some extent based on the needs of each particular discipline is great, even if that means less native compatibility between them.

Rather than attempting the impossible task of getting everyone to agree on one rigid set of specifications, developing bridges between these standards seems like a much better solution – as was suggested in the Jefferies article: “Aggregations as a key organising tool for this expanded universe of digital objects…”

I feel that large projects/organizations (such as Wikipedia’s Wikidata initiative) and collaborations between groups (such as BioSharing.org) working these things out as they go along is an important part of the process. There will always be multiple metadata standards in use among different groups (especially between different fields), but I think there will be a great deal of natural coalescence between standards if these groups keep communicating and exchanging feedback.

Developing Standards and Open Formats for Semantic Search

My experience with these concepts lies more in terms of web development and online marketing. Standards like HTML, XML, CSS, and so forth have always been important concerns at our company, and the industry at large, in terms of:

  • Whether a website is displaying correctly in the various different browsers and devices
  • Whether a website is ranking accurately (or advantageously, from a marketing standpoint) in search results

These days, the market share for browsers is divided between a relative few browsers; Chrome, IE, and Firefox constitute over 80% of (non-mobile) web browser usage. This means developers only really need to worry about how sites display in a few different browsers, and even then not nearly as much as in years past. It is important to note the importance of the World Wide Web Consortium (W3C) standards organization – of which Google, Microsoft, and Mozilla are members – in getting to this point.

When it comes to search technology, however, things are a little murkier. Because there’s a clear advantage for search providers in having a search engine that works a lot better than the other alternatives, it might seem there is little to no reason for internet search providers to collaborate. However, just because their algorithms are trade secrets doesn’t necessarily mean that they can’t (or shouldn't) agree upon semantic markup. It appears that this collaboration will center on schema.org.

At the same time, they clearly point out; schema.org is not a standards body like the W3C. This has led some to consider these search providers’ influence (especially Google’s) in developing and controlling web markup standards. After all, these three search engines have over a 90% market share worldwide.

The big questions now seem to be:

  • To what extent will these search engines continue to support other standards?
  • How much of a role will they play in adoption of various formats?
54

No comments