DTP


 
Lively discussions on the graphic arts and publishing — in print or on the web


Go Back   Desktop Publishing Forum > General Discussions > Web Design

Reply
 
Thread Tools Display Modes
Old 08-18-2007, 06:31 AM   #1
CarlSeiler
Member
 
CarlSeiler's Avatar
 
Join Date: Oct 2005
Location: Denton, TX
Posts: 271
Default Byte-Order Mark found in UTF-8 File.

I'm converting one of my pages from handwritten HTML 4.01 to php that generates XHTML 1.0 Strict. When I went to validate it at W3C, after cleaning up some closing slashes I'd missed on my img tags, I got a validation, but also got this warning message with a little yellow bang:
Byte-Order Mark found in UTF-8 File. The Unicode Byte-Order Mark (BOM) in UTF-8 encoded files is known to cause problems for some text editors and older browsers. You may want to consider avoiding its use until it is better supported.

I have no idea what that means, but indeed the file appears to have three little extra high-bit characters at the start. I don't have them in my php files, so I guess the server is adding them. In fact, I noticed that before I cleaned up my pages so it would validate correctly, Firefox would even display the characters. Now that it validates, Firefox doesn't display them, but I do get the warning message in the validator. Dillo 0.8 under Linux displays the characters.

How do I stop the BOM? Should I even worry about it?

Thanks,
Carl
CarlSeiler is offline   Reply With Quote
Old 08-18-2007, 07:16 AM   #2
ktinkel
Founding Sysop
 
ktinkel's Avatar
 
Join Date: Oct 2004
Location: In Connecticut, on the Housatonic River near its mouth at Long Island Sound.
Posts: 11,189
Default

Quote:
Originally Posted by CarlSeiler View Post
I have no idea what that means, but indeed the file appears to have three little extra high-bit characters at the start. I don't have them in my php files, so I guess the server is adding them.

How do I stop the BOM? Should I even worry about it?
We had a discussion about a related topic after we shifted to a new host a couple of months ago. Turned out I had created a different error from yours by using “UTF-8 no BOM” so that it was picked up by header.php. That is a no-no in that context, although it is a good way to encode HTML files.

Marjolein figured it all out. Not sure it is relevant for you, but look at the “RSS feeds are broken!!!” thread — maybe it will help.

   
__________________
[SIZE=2][COLOR=LemonChiffon]::[/COLOR][/SIZE]
[SIGPIC][/SIGPIC]
ktinkel is offline   Reply With Quote
Old 08-18-2007, 07:28 AM   #3
iamback
Member
 
iamback's Avatar
 
Join Date: Oct 2005
Location: Amsterdam, NL
Posts: 4,894
Default

Quote:
Originally Posted by CarlSeiler View Post
Byte-Order Mark found in UTF-8 File. The Unicode Byte-Order Mark (BOM) in UTF-8 encoded files is known to cause problems for some text editors and older browsers. You may want to consider avoiding its use until it is better supported.

I have no idea what that means, but indeed the file appears to have three little extra high-bit characters at the start.
(...)
How do I stop the BOM? Should I even worry about it?
Yes, you should worry about it as some browsers cannot handle it properly. You may see completely broken styling in IE for instance.

Strictly speaking, a file that's UTF-8 encoded does not need a byte-order mark (an indication for machines how to interpret the bytes in the file, as there's only one way to do that for UTF-8 encoded files) but some editors add one anyway when you specify UTF-8 encoding, or simlpy do so by default. It's quite possible your PHP files have it too, but browsers never see that and PHP doesn't care.

How to stop it depends on your editor: most editors have a way of telling it to save a file encoded as UTF-8 but not to write a BOM. Consult you're editor's documentation (or user support, forum, or whatever); if it can't do it, there are free (and non-free) editors that can do it for you. (For instance, I use both UltraEdit and Crimson Editor, and both are capable of writing (and editing) UTF-8 encoded files and not writing a BOM.

   
__________________
Marjolein Katsma
Look through my eyes on Cultural Surfaces (soon!), My ArtFlakes shop and Flickr.
Occasionally I am also connecting online dots... and sometimes you can follow me on Marjolein's Travel Blog
iamback is offline   Reply With Quote
Old 08-18-2007, 10:27 AM   #4
CarlSeiler
Member
 
CarlSeiler's Avatar
 
Join Date: Oct 2005
Location: Denton, TX
Posts: 271
Default

Thanks for the posts, guys. I didn't think it was my editor, but after reading here and also this post at Stanford, I found that it does seem to be something my editor (Notepad++) is doing.

It turns out that I'm a whole version of Notepad++ behind, and the newer versions allow you to pick UTF-8 without BOM. I'll be upgrading in a few minutes.

Carl
CarlSeiler is offline   Reply With Quote
Old 08-18-2007, 01:42 PM   #5
Michael Rowley
Member
 
Join Date: Jan 2005
Location: Ipswich (the one in England)
Posts: 5,105
Default

Marjolein:

Quote:
Strictly speaking, a file that's UTF-8 encoded does not need a byte-order mark
It doesn't need one in any manner of speech! But what happens when a file is UTF-16 encoded? Do some browsers give up, and if so, which?

   
__________________
Michael
Michael Rowley is offline   Reply With Quote
Old 08-19-2007, 10:36 AM   #6
iamback
Member
 
iamback's Avatar
 
Join Date: Oct 2005
Location: Amsterdam, NL
Posts: 4,894
Default

Quote:
Originally Posted by Michael Rowley View Post
But what happens when a file is UTF-16 encoded? Do some browsers give up, and if so, which?
A UTF-16 encoded file must have a byte-order mark. In the first place, to tell any program that tries to open it that it is UTF-16 encoded. If there is no BOM, there's no way to tell even whether it is supposed to be a text file or a binary file.

I have no idea of browser support for UTF-16, let alone UTF-16 without a BOM (which could be said not to be UTF-16 as a result). Is that even interesting? Are there any UTF-16 documents on the web? How many as compared to UTF-8 encoded documents? How many of those do not have a BOM? Should a browser care?

   
__________________
Marjolein Katsma
Look through my eyes on Cultural Surfaces (soon!), My ArtFlakes shop and Flickr.
Occasionally I am also connecting online dots... and sometimes you can follow me on Marjolein's Travel Blog
iamback is offline   Reply With Quote
Old 08-19-2007, 03:12 PM   #7
Michael Rowley
Member
 
Join Date: Jan 2005
Location: Ipswich (the one in England)
Posts: 5,105
Default

Marjolein:

Quote:
A UTF-16 encoded file must have a byte-order mark
Of course! But if UTF-8 doesn't use BOMs, and no browser supports BOMs, it follows that a BOM is not any use at all in the language browsers use. I thought you might have some views on the matter.

   
__________________
Michael
Michael Rowley is offline   Reply With Quote
Old 08-19-2007, 06:03 PM   #8
dthomsen8
Member
 
dthomsen8's Avatar
 
Join Date: Aug 2005
Location: Philadelphia, PA 19130
Posts: 2,158
Default Why UTF-8?

Quote:
Originally Posted by CarlSeiler View Post
... It turns out that I'm a whole version of Notepad++ behind, and the newer versions allow you to pick UTF-8 without BOM. ...
With the newest Notepad++, you can pick UTF-8 without BOM. If you are working on XHTML or HTML, why are you picking UTF-8?
dthomsen8 is offline   Reply With Quote
Old 08-19-2007, 11:53 PM   #9
Shane Stanley
Staff
 
Join Date: Oct 2004
Location: Melbourne, Australia
Posts: 526
Default

Quote:
Originally Posted by iamback View Post
A UTF-16 encoded file must have a byte-order mark.
A UTF-16 encoded file should have a byte-order mark, but sadly it is not a requirement.

Shane
Shane Stanley is offline   Reply With Quote
Old 08-20-2007, 12:02 AM   #10
iamback
Member
 
iamback's Avatar
 
Join Date: Oct 2005
Location: Amsterdam, NL
Posts: 4,894
Default

Quote:
Originally Posted by Michael Rowley View Post
Of course! But if UTF-8 doesn't use BOMs, and no browser supports BOMs, it follows that a BOM is not any use at all in the language browsers use. I thought you might have some views on the matter.
Well, my "view" (experience, rather) is that
  1. UTF-8 is widely used on the web and doesn't need but does allow a BOM (which merely states its encoding), so browsers should support (or at least ignore and not be confused by) it;
  2. it is not true that "no browser supports BOMs" and
  3. UTF-16 is rarely (if ever) used on the web, and
  4. finally, there is no "the language" that browsers use - most can use multiple languages, not limited to (X)HTML, CSS and JavaScript - but the languages browsers use have nothing to do with support (or lack of support) for BOMs or different encodings

   
__________________
Marjolein Katsma
Look through my eyes on Cultural Surfaces (soon!), My ArtFlakes shop and Flickr.
Occasionally I am also connecting online dots... and sometimes you can follow me on Marjolein's Travel Blog
iamback is offline   Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
More MySQL 'ORDER BY' Woes Tim Lodge Web Site Building & Maintenance 9 06-02-2006 10:10 PM
Mark Simonson type design ktinkel Fonts & Typography 3 02-18-2005 02:34 PM


All times are GMT -8. The time now is 06:59 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Contents copyright 2004–2014 Desktop Publishing Forum and its members.