#100 new
Serge Balyuk

[PATCH] POST params encoding in Ruby 1.9.1

Reported by Serge Balyuk | June 28th, 2010 @ 10:12 AM

Please find yet another approach to force_encoding params treatment in ruby1.9.1.

#48 spawned a great discussion. It seems like standards can be broken and browsers can misbehave. But I'm not sure why we should get params encoded in ASCII when specified charset and actual charset do match (i.e. cases when browsers did their work well - this happens pretty often with utf-8).

An additional feature is the env['rack.force_content_charset'] option which can be used to override request header setting, so that middleware can do the trick described by naruse (http://rack.lighthouseapp.com/projects/22435/tickets/48-rackutilsun...)

Utils::unescapse was patched so it now preserves encoding of input string for the result (was not the case for strings containing hex encoded parts).

This patch takes care only of body encoded parameters (POST), because charset parameter of Content-Type describes body. I'm not sure how I should treat params that come from URI query tough. Any comments are welcome.

BTW I had an alternative implementation which performed set_encoding on env['rack.input']. I liked the idea of having the whole body encoding set according to request header and then naturally spread it everywhere. Although it didn't break any existing tests, it still seemed to be unsafe for the code parsing multipart form submissions. And at the same time that parsing implementation does not preserve input stream encoding in resulting hash, so explicit force_encoding calls would still be required. So I've dropped that option for now.

Comments and changes to this ticket

  • Serge Balyuk

    Serge Balyuk June 28th, 2010 @ 10:21 AM

    • Tag changed from encoding, ruby-1.9 to encoding, patch, ruby-1.9
  • Serge Balyuk

    Serge Balyuk July 4th, 2010 @ 05:02 PM

    BTW discovered that Rails code overrides request content_type method in ActionDispatch::Http::MimeNegotiation (master) and changes its semantics: in rails it returns mime type (string value in master and mime type object in 2.3), while in rack it returns header field full value. So rails cuts off content type options, and charset is lost. Generally it's not very good to change method semantics in descendants (LSP and stuff), but it seems like original content_type and content_charset wasn't used before.

    I can update patch and add a workaround for this issue, but I'd like to get some feedback first (i.e. is it worth the effort).

Please Sign in or create a free account to add a new ticket.

With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.

New-ticket Create new ticket

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile »

Attachments

Referenced by

Pages