Yahoo Releases Internet Location Platform

View original post found on ProgrammableWeb authored by Raymond Yee

This week Yahoo announced its new Internet Location Platform API, “a resource for managing all geo-permanent named places on Earth” that “provide[s] the Yahoo! Geographic Developer Community with the vocabulary and grammar to describe the world’s geography in an unequivocal, permanent, and language-neutral manner.” (For more details see our new API profile.)

The RESTful read-only API associates “almost any named place on the Earth” with a unique identifier (called a “Where on Earth ID” or WOEID). Using Berkeley, California as an example you can see how the API uses WOEIDs to tell you about the relationship among locations. First, to look up the WOEID for Berkeley, CA:

http://where.yahooapis.com/v1/places.q(’Berkeley%20CA%20USA’)

returns an XML response tells you, among other things, the WOEID for Berkeley, CA (2362930), the centroid of Berkeley (37.869499, -122.270531), and a bounding box for the city. For a given WOEID, you can get the same data. In our example,

http://where.yahooapis.com/v1/place/2362930

resolves the WOEID of 2362930 back to Berkeley, CA.

The Yahoo! Internet Location Platform describes the relationships among locations in a variety of hierarchies. First of all, there is the concept of a locations’ parent. Any given WOEID has at most one parent. You can get the parent Berkeley, CA with

http://where.yahooapis.com/v1/place/2362930/parent?select=long

which returns a detailed record for Alameda County, the county in which Berkeley is located. You can then ask the API for the entire parent-oriented hierarchy (a WOEID’s parent, parent’s parent, etc.) with

http://where.yahooapis.com/v1/place/2362930/ancestors

which returns Alameda County -> California -> United States -> North America.

You can use the API to ask for the children of a WOEID. The URL

http://where.yahooapis.com/v1/place/2362930/children;count=100

returns the locations whose parent is Berkeley, CA. You’ll see a variety of place types in the list, including “suburbs” (e.g., South Berkeley) and zip codes (e.g., 94703). Note the use of the matrix parameter count to increase the number of children returned to up to 100.

Not surprisingly, you can ask for the siblings of a WOEID, all the other locations that share the same parent. Among the siblings of Berkeley, CA listed in http://where.yahooapis.com/v1/place/2362930/siblings;count=100 are such places as Albany, Oakland, and Union City, which are other cities located in Alameda County.

What makes the Yahoo! Internet Location Platform especially interesting is how it supports “unofficial” or “informal” relationships beyond the strict one-parent-at-most hierarchy in belongtos and neighbors methods. The URL

http://where.yahooapis.com/v1/place/2362930/neighbors;count=100

returns “neighbors” to Berkeley, CA including Albany (which has the same parent as Berkeley) but also El Cerrito (which has a different parent, namely, Contra Costa County). It’s puzzling, however, why Oakland is not considered a neighbor to Berkeley. I suppose the caveat that “that neighbors are not necessarily geographically contiguous” can be extended to “geographically continguous locations are not necessarily neighbors”.

Or by using the belongtos verb you can learn about various informal aggregations. The URL

http://where.yahooapis.com/v1/place/2362930/belongtos;count=100

returns both official containing units for Berkeley (e.g., Alameda County and California) but also somewhat fuzzier units (West Coast, San Francisco Bay Area, and New Southwest).

Finally, there’s support for “location names in multiple languages including English, French, German, Italian, Spanish and Dutch as well as local double-byte character set data in Japan, Hong Kong, Korea and Taiwan.” For example, look at record for the USA in English, French, and Chinese.

WOEIDs are already being used in Flickr and its API, as Rev Dan Catt documents. Presumably, the Location Platform API will be used throughout Yahoo!’s other services (such as the Fire Eagle API.)

However, will WOEIDs be widely adopted outside of the context of using Yahoo!’s services? The breadth of coverage should be attractive to many developers:

The Yahoo! Internet Location Platform contains about six million places. Coverage varies from country-to-country but globally includes several hundred thousand unique administrative areas with half a million variant names; several thousand historical administrative areas; over two million unique settlements and suburbs, and two-and-a-half million unique postcode points covering about 150 countries, plus a significant number of points of interest, Colloquial Regions, Area Codes, Time Zones, and Islands.

However, there are important questions of ownership/restrictions on this data the possibility of lock-in. It’d be interesting, for instance, to compare the functionality and depth of the Yahoo! Internet Location Platform with the data and services currently in the GeoNames API (see its ProgrammableWeb profile and the list of 38 mashups using the API.) On a technical note, the current API doesn’t seem to support reverse geocoding (to return a place name for a given latitude and longitude), functionality found in the Flickr API and GeoNames API).

For further analysis, you can read Marshall Kirkpatrick’s piece in ReadWriteWeb, Brady Forrest in O’Reilly Radar, as well as Rev Dan Catt’s detailed analysis, mentioned above.

Share This

Semantic Search the US Library of Congress

View original post found on ProgrammableWeb authored by Raymond Yee

As the national library of the United States, the Library of Congress has created vast amounts of metadata to describe books and other documents in its collection. Among this metadata is the Library of Congress Subject Headings (LCSH), a “controlled vocabulary” for classifying documents by subject. In order words, experts at the Library of Congress have come up with a (large) list of subject headers from which catalogers of documents can choose. As an example, if you look at the Library of Congress record for Tim Berners-Lee’s book Weaving the Web, you’ll that it is classified under “World Wide Web“, specifically “World Wide Web–History“.

Since the Library of Congress isn’t the only entity that classifies documents, you can imagine that other entities (and not just libraries) would interested in reusing the LCSH vocabulary. But how should the Library of Congress make LCSH available so that it can be easily reused?

That’s where the recent release of lcsh.info comes in (see also the lcsh.info ProgrammableWeb Profile):

This is an experimental service that makes the Library of Congress Subject Headings available as linked-data using the SKOS vocabulary. The goal of lcsh.info is to encourage experimentation and use of LCSH on the web with the hopes of informing a similar effort at the Library of Congress to make a continually updated version available. More information about the Linked Data effort can be found on the W3C Wiki.

Let’s look at what you can do with lcsh.info through a couple of examples. First, we return to the subject heading World Wide Web, this time accessible from lcsh.info as

http://lcsh.info/sh95000541

Note the form of the URL: http://lcsh.info/{lccn} where lccn refers to the Library of Congress Control Number (LCCN), an identifier of the subject heading. In this case, the LCCN for World Wide Web is sh95000541.

If you drop this URL into your browser, you’ll get the default format or representation of the information lcsh.info has about the World Wide Web subject header, including:

The diagram below illustrates some of these relationships

lcshgraph.png

To facilitate reuse of the data, lcsh.info offers its data a variety of formats that can be accessed via content negotiation. That is, you use the Accept HTTP header to specify which of the following content type you want:

  • XHTML (with embedded RDFa), which is the default value (application/xhtml+xml)
  • JSON (application/json)
  • RDF/XML (application/rdf+xml)
  • N3 (text/n3)

For example, you can use curl to get JSON representation of the World Wide Web subject header:

curl -v -L -H “Accept: application/json” http://lcsh.info/sh85062913

By looking at the RDF/XML and N3 representations, you can see a concrete example of semantic web approaches to express notions of broader, narrower, and related terms as well as alternative labels using

  • Simple Knowledge Organization System (SKOS), which is “a model for expressing the basic structure and content of concept schemes such as thesauri, classification schemes, subject heading lists, taxonomies, folksonomies, and other types of controlled vocabulary”
  • designs rules for linked data to represent the network of interconnected subject headings

This experimental but promising service may soon pave the way for full production level web services from the Library of Congress.

Share This