Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The query is:

  select * from html where url="http://news.ycombinator.com/" and
  xpath='//tr/td/a[substring(@href,1,4)="http"][@href!="http://ycombinator.com"]'
You can play with it yourself here (needs yahoo login):

http://developer.yahoo.com/yql/console/



You know, honestly, I wasn't grokking much about how cool YQL could be until I saw this example. This helps a lot. Thanks.


I don't get it. Don't queries like that suffer from the same problems as normal screen scraping?


Yeah, they're obviously brittle, but it's baby steps, y'know?

I think it's pretty crazy that you can now scrape well-marked pages with a SQL-like syntax.


YQL isn't that great server-side until they add an `Access-Control-Allow-Origin: *` header. I'm pretty disappointed that they completely disallow Javascript access.


It offers JSONP instead so you can access any YQL content via Javascript in a browser via script nodes.


You still have to give Yahoo! complete control with that method. With XHR, Yahoo! has no control over your webpage.


It only it works in a few browsers but doesn't seem like it would hurt anything to add it. Thanks for the suggestion!


I love how HN is the only site where "[x] would be cool" turns into a feature request implemented by the vendor.


Nah, we do the same thing for things suggested on Twitter :)




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: