Difference between revisions of "Navigating to HTML page"

From Wiki
Jump to navigation Jump to search
(New page: The next step is to retrieve the HTML pages created in the step above. Here I have used the ruby library 'open-uri' to retrieve the web-page and another libray 'hpricot' to edit these page...)
 
Line 1: Line 1:
 
The next step is to retrieve the HTML pages created in the step above. Here I have used the ruby library 'open-uri' to
 
The next step is to retrieve the HTML pages created in the step above. Here I have used the ruby library 'open-uri' to
retrieve the web-page and another libray 'hpricot' to edit these pages and translate html markup into ConTeXt markup.
+
retrieve the web-page and another libray [http://code.whytheluckystiff.net/hpricot 'hpricot'] to edit these pages and translate html markup into ConTeXt markup.
 
    
 
    
 
<pre>
 
<pre>

Revision as of 07:46, 16 July 2007

The next step is to retrieve the HTML pages created in the step above. Here I have used the ruby library 'open-uri' to retrieve the web-page and another libray 'hpricot' to edit these pages and translate html markup into ConTeXt markup.


#scan_page.rb = Retrieves the html page of interest from the server,
#        navigates to links within the main page and construct a
#        context document
                            
#!/usr/bin/ruby                   
                
require 'rubygems'
require 'open-uri'        # the open-uri library                                                                                   
require 'hpricot'         # the hpricot library                                                                                        
require 'scrape_page'     # user-defined function to filter html  into ConTeXt                                                
                                                                   
# scans the home page and lists     
# all the directories and subdirectories

doc=Hpricot(open("http://ipa.dd.re.ss/AnnRep07"))