Converting to XML or JSON with YQL
Posted Feb 25th, 2009 by David Calhoun in javascript, rss, xmlXML, JSON, YQL. Is that enough acronyms for you?
XML and JSON you’ve probably heard about, but maybe not YQL, which is Yahoo!’s SQL-esque query language which was released to the public late last year. It’s primarily advertised as a service to access data from Yahoo! properties such as Flickr and Yahoo! News. What you may not know is that you can also use it to access any XML/RSS or HTML (!), which Chris Heilmann demonstrated on Ajaxian. Yes, HTML! This eliminates a lot of the headache involved with searching for updates on a page that doesn’t offer an RSS feed (although YQL can also read RSS feeds, it only exports to XML and JSON formats, which means there’s more work involved if you want to convert the HTML to RSS)
What makes YQL especially awesome is the YQL Console, which allows you to test queries instantly. There’s a few example queries to get you started, but they’re all examples using YQL tables from Yahoo sites. That’s cool, but what if we want to get information from any website? Easy!
For instance, here’s a query to get the first 3 links on google.com in JSON format (with a callback of googlelinks):
select * from html where url="http://www.google.com/" and xpath='//a' limit 3
The console gives us a big REST url and also shows us the output of the query (which can be accessed by visiting the REST url):
googlelinks({
"query": {
"count": "3",
"created": "2009-02-25T03:45:00Z",
"lang": "en-US",
"updated": "2009-02-25T03:45:00Z",
"uri": "http://query.yahooapis.com/v1/yql?q=select+*+from+html+where+url%3D%22http%3A%2F%2Fwww.google.com%2F%22+and+xpath%3D%27%2F%2Fa%27+limit+3",
"diagnostics": {
"publiclyCallable": "true",
"url": [
{
"execution-time": "28",
"content": "http://www.google.com/robots.txt"
},
{
"execution-time": "53",
"content": "http://www.google.com/"
}
],
"user-time": "85",
"service-time": "81",
"build-version": "911"
},
"results": {
"a": [
{
"href": "http://images.google.com/imghp?hl=en&tab=wi",
"onclick": "gbar.qs(this)",
"content": "Images"
},
{
"href": "http://maps.google.com/maps?hl=en&tab=wl",
"onclick": "gbar.qs(this)",
"content": "Maps"
},
{
"href": "http://news.google.com/nwshp?hl=en&tab=wn",
"onclick": "gbar.qs(this)",
"content": "News"
}
]
}
}
});
If you check out our YQL query, you can see the field that specifies the XPath of the data you want to access. Don’t worry, I hadn’t heard of XPath before this. You know if you’ve read some of my previous articles that I’m sort of a beginning transitioning into more intermediate stuff, and blogging about it on the way. Well, here’s something new I learned this time!
XPath basically provides a standard for accessing XML data. The simple XPath in our query above was //a, which basically says “get all a elements from anywhere in the document”. And of course our “a” elements are our links!
You can read more on XPath syntax at W3Schools.
My ultimate goal for this was to convert HTML pages into RSS for the purpose of checking for new topics on forums I frequent. Unfortunately, not all of the forums provide an RSS feed, so I have to take the non-lazy route and actually visit the forum to check for updates.
Again, unfortunately YQL will only convert into either XML or JSON format, not into RSS. RSS is a type of XML, but it has some constraints. So the next step here would be to take either the XML or JSON and make something to convert it to RSS. YQL takes care of the hard part - now it’s left to us to use the data.
Note to self: write an HTML to RSS converter.
Leave a Reply
Categories
- accessibility (1)
- browser bugs (2)
- css (6)
- html (6)
- javascript (9)
- jquery (3)
- mobile (1)
- performance (2)
- php (1)
- regular expressions (1)
- rss (3)
- seo (1)
- Site News (1)
- table (1)
- Uncategorized (4)
- videos (2)
- wordpress (1)
- xml (2)
- yui (0)
Nice article! You can complete the conversion to RSS by using Yahoo! Pipes. We’ve added a YQL “module” into the editor so you can run YQL queries inside Pipes. Then you need to rename a few fields (also in Pipes) to “title” and “description” (and a few other RSS XML elements) and then Pipes will give you an RSS feed.
@Jonathan Thanks for visiting! Awesome, I had definitely heard of Yahoo! Pipes but didn’t look into it further until now. I’ll definitely give it a shot to get the RSS feed! Thanks for the tip!