read

Sometimes, when you need to download an html web page, it can be interpreted as a wrong encoding, causing errors with accents, and special characters.

You can avoid those problems converting to UTF-8. As follows:

require 'rchardet' # Encoding detection gem: https://github.com//rchardet  
require 'iconv'  
Encoding.default_external = "UTF-8" # Sets default external encoding.

def ensure_utf8(data)  
  encoding = detect_encoding(data)
  data = data.force_encoding(encoding)
  if encoding != "UTF-8"
    encoding = "ISO-8859-1"
    data = Iconv.iconv("UTF-8", encoding, data).join
  end
  data
end

def detect_encoding(data)  
  cd = CharDet.detect(data)
  cd[:encoding].name
end  

And just calling ensure_utf8 function:

require 'net/http'  
url = URI.parse('http://www.example.com/index.html')  
res = Net::HTTP.start(url.host, url.port) {|http|  
  http.get(url.path)
}
valid_content = ensure_utf8(res.body)  

Hope it helps.

Blog Logo

Endel Dreyer

Full-stack developer. Loves Ruby and JavaScript.


Published

Image

bugfixer / @endel

About programming, tools and solving problems.

Back to Overview