DTP


 
Lively discussions on the graphic arts and publishing — in print or on the web


Go Back   Desktop Publishing Forum > General Discussions > Web Site Building & Maintenance

Reply
 
Thread Tools Display Modes
Old 12-05-2011, 12:35 PM   #1
BobRoosth
Member
 
Join Date: Jan 2005
Location: Los Angeles, Ca.
Posts: 933
Default Tool for stripping out Word markup

Does anyone have a good tool for removing all MS Word html markup before pasting into a web page?

I can do it with S&R, but that is time consuming. Surely there is something quick and easy.
BobRoosth is offline   Reply With Quote
Old 12-05-2011, 02:20 PM   #2
terrie
Staff
 
Join Date: Oct 2004
Posts: 8,944
Default

Have you tried copy/pasting into something like Notepad and/or saving it in Word as straight ascii text?

Terrie
terrie is offline   Reply With Quote
Old 12-05-2011, 03:51 PM   #3
BobRoosth
Member
 
Join Date: Jan 2005
Location: Los Angeles, Ca.
Posts: 933
Default

Notepad strips a bit too much. I'd like to keep paragraphs and italic/bold. Notepad makes it all one 'graph with <br>s between logical paragraphs....
BobRoosth is offline   Reply With Quote
Old 12-05-2011, 05:23 PM   #4
Michael Beloved
Member
 
Join Date: Sep 2008
Location: Brooklyn NY
Posts: 141
Default

Have you tried a Save As as Filtered Web page in Word itself?

If you have done that and it is not striping out enough, and if you do not want to use Notepad, then the next best thing is to move it into a program like Microsoft Expression Web 2 or Dreamweaver (Adobe) and then you can do a find and replace.

In that dialogue box, there is a tab which allows you to find and replace one code with another. So if you can use that on the Code side, you can copy the undesirable Word codes and replace them all with one click or individually using that dialog box.

The same thing can be done used Notepad or preferably, Notepad ++ which also have a find and replace, I think.

   
__________________
michael beloved
Michael Beloved is offline   Reply With Quote
Old 12-05-2011, 05:48 PM   #5
BobRoosth
Member
 
Join Date: Jan 2005
Location: Los Angeles, Ca.
Posts: 933
Default

I have used Web page Filtered. It does reduce the cr-p, but leaves some. S&R in Dreamweaver takes care of the rest. It's Cleanup Word HTML does very little.

I'd like a one-step process. I see several programs that claim to do the job. All at $50 or so. I was hoping someone here had a Word macro or reg-ex to do the job.
BobRoosth is offline   Reply With Quote
Old 12-05-2011, 06:24 PM   #6
Steve Rindsberg
Staff
 
Join Date: Nov 2004
Posts: 6,742
Default

HTMLTidy, maybe?

http://www.w3.org/People/Raggett/tidy/

   
__________________
Steve Rindsberg
====================
www.pptfaq.com
www.pptools.com
and stuff
Steve Rindsberg is offline   Reply With Quote
Old 12-05-2011, 06:55 PM   #7
BobRoosth
Member
 
Join Date: Jan 2005
Location: Los Angeles, Ca.
Posts: 933
Default

Useful, but does not kill this mess:

<p class="MsoNormal"> or the various spans. Wish it would kill the style def and these classes.
BobRoosth is offline   Reply With Quote
Old 12-05-2011, 07:08 PM   #8
Michael Beloved
Member
 
Join Date: Sep 2008
Location: Brooklyn NY
Posts: 141
Default

there is one other thing that might work, which is to open the file in Microsoft Expression Web or in Dreamweaver, then highlight the document or part of the document
Then click on Format and then click on remove formatting
I am not sure if how this in on Dreamweaver

This does remove those odd Word html codes

Of course as a precaution do this to a copy of the file
If you do not find it in Dreamweaver, I would be willing to do it using my Expression Web 2, but this works for sure. I do not know what else it would delete so you just have to try it on a duplicate file and see

   
__________________
michael beloved
Michael Beloved is offline   Reply With Quote
Old 12-05-2011, 07:42 PM   #9
BobRoosth
Member
 
Join Date: Jan 2005
Location: Los Angeles, Ca.
Posts: 933
Default

Thanks for the offer. This an ongoing issue. HTML Tidy cleaned up most of it. DW Clean Word HTML actually got the rest, except for an unnecessary <DIV>.

Thanks for the suggestions.
BobRoosth is offline   Reply With Quote
Old 12-06-2011, 01:12 AM   #10
LoisWakeman
Staff
 
LoisWakeman's Avatar
 
Join Date: Jan 2005
Location: Uplyme, Devon, England
Posts: 1,402
Default

You'd need pretty powerful regular expressions in your Word macro to do that: the problems with Word HTML and the way most people use Word (i.e. the whole thing in Normal style with applied para and character formatting, rather than using styles rigorously), is that you get a completely variable style soup. I would (and do) tend to go the Notepad route and reapply bold and italic, myself. But is seems from your last post that you have almost gotten there, anyway!
LoisWakeman is offline   Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
CSS layout tool ktinkel Web Design 0 08-10-2007 02:17 PM
Word spacing in Word 2007 Michael Rowley Software 5 08-07-2007 11:15 AM
Interactive: Word 2003 to Word 2007 command reference guide Michael Rowley Software 2 06-22-2007 11:36 AM
Another Font Tool Howard White Fonts & Typography 6 11-15-2006 03:01 PM
Xylescope CSS tool (Mac OS X) ktinkel Web Design 2 02-08-2006 11:29 AM


All times are GMT -8. The time now is 09:35 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Contents copyright 2004–2014 Desktop Publishing Forum and its members.