noko4.us | Blog | Demos | Links | Contact

Blog

Some History

Dan Bikle 2014-07-13

I use Nokogiri to saw-out sections of HTML from webpages.

Then I use it again to glue-together those sections in different ways.

If you see the web as a vast wood-shop of reusable components, Nokogiri is your tool.

According to rubygems.org, Nokogiri has been around since late 2008:

http://rubygems.org/gems/nokogiri/versions

I discovered Nokogiri through Ajay Kapur who founded a site named Moovweb:

http://www.moovweb.com

Moovweb builds technology which transforms web content intended for desktop browsers into web content targeted at mobile devices.

http://www.google.com/search?q=nokogiri+moovweb

Before Nokogiri, I had been sawing on sites with Hpricot which came to my attention in 2006.

But then in early 2010, Ajay told me he preferred Nokogiri because it was faster.

Obviously other people agree with him because Nokogiri now has many more downloads than Hpricot (according to rubygems.org).

Although Nokogiri is useful for what I call the HTML-transformation use-case, it is also used frequently for QA of websites.

For example if you offer website operation services and you need to frequently check that your websites are serving HTML which is specified in a Git repository, then Nokogiri (or something like it) is under the hood of your QA software.

Another use-case is Behavior Driven Development (BDD) of Rails sites.

A Product Manager might start a BDD session by saying, "I want the home-page to have links: Blog, Sign-In, Sign-Up, and Contact".

Then he/she might write an RSpec-feature which looks for the links.

After I add the links, the site will pass the spec and the Product Manager will be happy.

Everytime the site gets changed and committed to Git, we run the RSpec-feature again to ensure the links were not removed by mistake.

As we build the site and add features, we'd add tests to ensure each behavioral component is maintained.

Eventually the site might be protected by hundreds, maybe thousands of these tests.

Underneath the hood of these RSpec-tests, somewhere, would be Nokogiri.

Nokogiri is useful software.

Interact with it here.

It will help you saw-away at certain types of problems.

A Simple Nokogiri Use Case: IBM Option Prices

Dan Bikle 2014-07-14

Some people believe that sometimes option-price-deltas have an effect on stock-price-deltas.

In order to study this possible dependency, I need to capture both types of deltas over a period of time.

For IBM, capturing stock-price-deltas is easy.

Just click on this link and Yahoo will gladly give you IBM price data going back to 1962:

http://real-chart.finance.yahoo.com/table.csv?s=IBM

For IBM, capturing option-price-deltas is more difficult.

Nokogiri is well suited for capturing current option-prices.

So, I use it to capture current option-prices once a day using a cron-script.

After a year I then have a year's worth of option-prices which I can then use to study a possible dependence of stock-prices on option-prices.

To start work on my Nokogiri cron-script, I install the Firebug plugin in my Firefox browser.

Next I load this URL into Firefox:

http://finance.yahoo.com/q/op?s=IBM+Options

Then I right click on the table in the above page so I can ask Firebug to tell me about the CSS connected to the page.

https://www.google.com/search?q=How+I+get+CSS+path+from+Firebug

Firebug tells me that the CSS Path of the HTML-table I am interested in is this:

html.yui3-js-enabled.js.no-touch.csstransforms3d.svg.fullscreen
body.options.intl-us.yfin_gs.gsg-0.ff.ff31
div#screen
div#rightcol
table#yfncsumtab
tbody
tr
td
table.yfnc_datamodoutline1
tbody
tr
td
table

I use the form below to experiment with various CSS paths combined with Nokogiri:

Study various CSS paths

Page to Saw On:
Nokogiri Search Expression:

After I used the above form to study how Nokogiri reacted to my CSS paths, I wrote this ruby script:

#
# ibm_opt.rb
#

# I use this script to get option prices into CSV format.

require "rubygems"
require "open-uri"
require "nokogiri"

opt_url = "http://finance.yahoo.com/q/op?s=IBM+Options"
hdrs = {"User-Agent"=>"Mozilla/5.0 (Windows NT 6.3; Win64; x64)"}
hdrs["Accept-Charset"] = "utf-8"
hdrs["Accept"] = "text/html"
my_html = ""

open(opt_url, hdrs).each {|s| my_html << s}

doc = Nokogiri::HTML(my_html)
noko_tables = doc.css("table#yfncsumtab table.yfnc_datamodoutline1 table")
# I should now have 2 noko_tables.
# Use nested loop to generate CSV content:
noko_tables.css("tr").each { |tr_el|
  unless(tr_el.css("td").nil? or tr_el.css("td").first.nil?)
    tr_el.css("td").each { |td_el|
      print "#{td_el.content},"
    }
    print "\n"
  end # unless
}

# Outside of this script I need to remove trailing comma on each line.

Then I wrote this shell script:

#!/bin/bash

# ibm_opt.bash

# I use this script to call a Ruby script which uses Nokogiri to get
# option prices from web and then convert them to CSV format.

cd /home/dan/price_getter/
time_now=`date +%Y_%m_%d_%H_%M`
ruby ibm_opt.rb > ibm_opt_${time_now}.csv
exit


Then I wrote this cron entry to use both scripts to collect the option prices once a day into a CSV file:

59 18 * * mon,tue,wed,thu,fri /home/dan/price_getter/ibm_opt.bash

After I collect option prices for a few years, I can then study relationships between options and stocks.

noko4.us | Blog | Demos | Links | Contact