noko4.us | Blog | Demos | Links | Contact

Demos

Get a list of links from a page

Page to Saw On:
Nokogiri Search Expression:

The above demo is simple. I just cut a bunch of links out of a page and spread them on the table.

Now I get some glue and attach each link to a set of list-elements:

Attach links to a set of list-elements

Page to Saw On:
Nokogiri Search Expression:

The Ruby code I used to cut each link out of the page and then glue each link to an li-element is displayed below:

#
# noko_cut_attach.rb
# 

require "rubygems"
require "open-uri"
require "nokogiri"

usr_agnt = "Mozilla/5.0 (Windows NT 6.3; Win64; x64)"
hdrs = {"User-Agent"   => usr_agnt}
hdrs["Accept-Charset"] = "utf-8"
hdrs["Accept"]         = "text/html"

my_html = ""

open("http://rubygems.org", hdrs).each {|s| my_html << s}

new_html = "<ul>"

# I need only 3 lines of Nokogiri:
doc = Nokogiri::HTML(my_html)
noko_enum = doc.css("a")
noko_enum.each { |link| new_html << "<li>#{link.to_html}</li>" }

new_html << "</ul>"

p new_html

I describe the general idea behind the above script as, "Use the minimum amount of Nokogiri".

Most of the work in the above script is done using string objects and Enumerable.each().

Another approach is to rely more on Nokogiri and less on String:

#
# noko_cut_wrap_attach.rb
# 

require "rubygems"
require "open-uri"
require "nokogiri"

usr_agnt = "Mozilla/5.0 (Windows NT 6.3; Win64; x64)"
hdrs = {"User-Agent"   => usr_agnt}
hdrs["Accept-Charset"] = "utf-8"
hdrs["Accept"]         = "text/html"

my_html = ""

open("http://rubygems.org", hdrs).each {|s| my_html << s}

# Use Nokogiri instead of String to build a ul-element full of links
doc = Nokogiri::HTML(my_html)
ul_elem = Nokogiri::XML::Node.new "ul", doc
noko_enum = doc.css("a")
noko_enum.each{ |anch| 
  li_elem = Nokogiri::XML::Node.new "li", doc
  li_elem.parent = ul_elem
  anch.parent = li_elem
}

p ul_elem.to_html

Nokogiri is useful for interacting with element attributes.

The demo below shows how to get a list of href attributes from anchor-elements.

Get href attributes, display them as a set of list-elements

Page to Saw On:
Nokogiri Search Expression:

The Ruby code I used to cut each href-attribute out of the anchor-elements and then glue each href-string to an li-element is displayed below:

#
# noko_cut_href.rb
# 

require "rubygems"
require "open-uri"
require "nokogiri"

hdrs = {"User-Agent"=>"Mozilla/5.0 (Windows NT 6.3; Win64; x64)"}
hdrs["Accept-Charset"] = "utf-8"
hdrs["Accept"] = "text/html"

my_html = ""

open("http://rubygems.org", hdrs).each {|s| my_html << s}

new_html = "<ul>"

# I need only 3 lines of Nokogiri:
doc = Nokogiri::HTML(my_html)
noko_enum = doc.css("a")
noko_enum.each { |link| new_html << "<li>#{link.get_attribute('href')}</li>" }

new_html << "</ul>"

p new_html

Most of the work in the above script is done using string objects and Enumerable.each().

Another approach is to rely more on Nokogiri and less on String:

#
# noko_cut_href2.rb
# 

require "rubygems"
require "open-uri"
require "nokogiri"

hdrs = {"User-Agent"=>"Mozilla/5.0 (Windows NT 6.3; Win64; x64)"}
hdrs["Accept-Charset"] = "utf-8"
hdrs["Accept"] = "text/html"

my_html = ""

open("http://rubygems.org", hdrs).each {|s| my_html << s}

doc = Nokogiri::HTML(my_html)
ul_elem = Nokogiri::XML::Node.new "ul", doc

noko_enum = doc.css("a")
noko_enum.each { |anch|
  li_elem = Nokogiri::XML::Node.new "li", doc
  li_elem.inner_html = anch.get_attribute('href')
  li_elem.parent = ul_elem
}

p ul_elem.to_html

Sometimes it is convenient to copy the value of href into the content of each anchor-element:

Get href attributes, display them as a set of links inside list-elements

Page to Saw On:
Nokogiri Search Expression:

The above form relies more on Nokogiri and less on String:

#
# noko_cut_href3.rb
# 

require "rubygems"
require "open-uri"
require "nokogiri"

hdrs = {"User-Agent"=>"Mozilla/5.0 (Windows NT 6.3; Win64; x64)"}
hdrs["Accept-Charset"] = "utf-8"
hdrs["Accept"] = "text/html"

my_html = ""

open("http://rubygems.org", hdrs).each {|s| my_html << s}

doc = Nokogiri::HTML(my_html)
ul_elem = Nokogiri::XML::Node.new "ul", doc

noko_enum = doc.css("a")
noko_enum.each { |anch|
  anch.inner_html = anch.get_attribute("href")
  li_elem = Nokogiri::XML::Node.new "li", doc
  li_elem.parent = ul_elem
  anch.parent    = li_elem
}

p ul_elem.to_html

In CSS I can search for an element using an attribute and its value:

https://www.w3schools.com/cssref/sel_attribute_value.asp

In Nokogiri I can also search this way but I need to add some quotes:

Search by attribute and exact attribute-value:

Page to Saw On:
Nokogiri Search Expression:

Search by attribute only:

Page to Saw On:
Nokogiri Search Expression:

I found evidence that I can use CSS to search for an element using a partial attribute value.

Here is a syntax example:

my_elements = doc.css("a[@title~='RubyGems']")

I cannot get Nokogiri to work as I expect with the above type of syntax.

So, I resorted to searching using XPath functionality in Nokogiri:

Search by attribute and partial attribute value using XPath:

Page to Saw On:
Nokogiri XPath Search Expression:

The above form relies on .xpath() rather than .css():

#
# xpath_partial_att.rb
# 

require "rubygems"
require "open-uri"
require "nokogiri"

hdrs = {"User-Agent"=>"Mozilla/5.0 (Windows NT 6.3; Win64; x64)"}
hdrs["Accept-Charset"] = "utf-8"
hdrs["Accept"] = "text/html"

my_html = ""

open("http://rubygems.org", hdrs).each {|s| my_html << s}

doc = Nokogiri::HTML(my_html)
ul_elem = Nokogiri::XML::Node.new "ul", doc

noko_enum = doc.xpath('//a[contains(@title,"RubyGems")]')
noko_enum.each { |anch|
  anch.inner_html = anch.get_attribute("href")
  li_elem = Nokogiri::XML::Node.new "li", doc
  li_elem.parent = ul_elem
  anch.parent    = li_elem
}

p ul_elem.to_html
noko4.us | Blog | Demos | Links | Contact