Editing Word and Office Documents — in batch

October 23rd, 2007

The file formats in Microsoft Office 2007 for Word, Excel, and PowerPoint (.docx, .xlsx, and .pptx, respectively) are based on ZIP technology. Just change the file extension to .zip, open in WinZip or a similar program and you expose the internals of the files.  And presto, you have XML files (broken into a set of files and folders) that can be edited in Notebpad, or by a batch script, or any other means.

So, what does this mean for you, who need to efficiently manage your content in Word or other Office apps?  You can edit, delete, add or verify information automatically for all files in a folder, for example, without opening individual documents. 

Here is an example (from the user guide for Author Max™):

<dc:title>Author Max™ Toolbar Pro</dc:title>
<dc:subject>User guide</dc:subject> 

So if I wanted to change the metadata field subject (doc property subject in Wordspeak) for many documents, I could write a script that looked in the core.xml file for all Word documents and presto make the change.  (Of course, it might be simpler just to use Author Max™ to enforce rules for document properties and styles in the first place.)

Here’s another example, also taken from the Author Max documentation.

< ...w:val="center"/>© 2007 Method M Ltd. (All rights reserved)…

So if I wanted to change the footer for many documents to be left aligned, and the copyright year to be 2008, all that would be needed is to write a script that looked in the footer3.xml file for all Word documents and presto make the change.  (Of course, it might be simpler just to use Author Max™ to enforce rules for headers and footers in the first place.)  Need help implementing this or other Word functionality? Hey, that’s what our e-mail is for (info at methodm dot com).

Best wishes for clear, efficient and great writing!
Katriel

Succor for Victims of Word’s Automatic Formatting

September 24th, 2007

Tech writing forums regularly get hit with questions in the vein of “Word has corrupted my styles”. And the answers that come in are useful for some cases. Such as:

  • Always “paste unformatted”.
  • Deselect the option to Automatically Update Document Styles.
  • Deselect the option to Define Styles Based on Your Formatting.

However, what to do when my files already have helter skelter formatting? For this very need we have included “fix styles” in our Author Max toolkit. Details here.

Be a victim of Word’s automatic formatting “features” no longer. Equip yourself with Author Max and fight back.
Katriel

DITA Resources from DITA Users

September 4th, 2007

A useful list of DITA sites (thanks to Bob Doyle):

  • DITA Users - get started using web dased editor, personal workspace folder on the web.
  • DITA Infocenter - DITA specifications, and the DITA Open Toolkit User Guide.
  • DITA News - a blog aggregator, a mailing list, and more.
  • DITA Blog - a group blog for DITA information developers (based on WordPress).
  • DITA Wiki - resources in a format that encourages comments and discussions (based on MediaWiki).

Improve Your Relationships. Where to Code Links? Best practice.

September 3rd, 2007

Authors working in FrameMaker or Word have hard-coded links within topics to other topics using cross-references. When moving to DITA, authors often tend towards hard-coding links in topics, inserting cross references or using the  element.

What’s wrong with hard-coded links?
They decrease reusability, they tend to break, they tend to get out of date, and they are high maintenance.

  1. Decreased reusability: hard-coded links may not make much sense when a topic is reused but if you hard-coded the links you’re stuck with them.
  2. They tend to break: if the target topic is renamed or moved, the link will break.
  3. They tend to get out of date:  if a related topic is added, the author would have to look in many topics, find the appropriate locations and insert many times the appropriate link.
  4. High maintenance: see reasons 2 and 3 above.

Hard-coded links are not in a good idea in FrameMaker or Word, but when working in unstructured DTP tools you didn’t have much choice.  In DITA you do — and you should use it. “Relationship tables” in DITA allow you to control linking from one place, for many topics, rather than hard code links within many topics.

It’s not often that this blog for power authors is able to offer relationship advice, but today we are. Use relationship tables and start improving your documents!

Best wishes,
Abby, … oops, I mean Katriel

Word 2007 - Working in Compatibility Mode

August 8th, 2007

If you will be using any fancy Word 2007 features (see the previous post for a listing), and sharing your Word files with users of Word 2003 or earlier versions, you should consider working in compatibility mode,

Compatibility mode ensures that content created in 2007 can be opened/editing in 2003.  For example, when you choose “Insert SmartArt” in compatibility mode, the 2003 diagramming tool appears. (When not in compatibility mode, the content created by the 2007 SmartArt diagramming tool will not be fully editable in Word 2003).

Compatibility mode limits some features of 2007.  Compatibility mode is automatically enabled when your open a *.doc file.  (The words “Compatibility Mode” display in Word’s title bar.)   You can switch from compatibility mode to full functionality for any document by selecting Convert from the Office Start Button.
Katriel

Word 2007 and Word 2003 Compatibility Issues

August 8th, 2007

Congratulations, you have Microsoft Word 2007.  Excellent choice.  But, you have to share files with less evolved colleagues still using Word 2003 (or, gasp, an even earlier version).  The bad news is that while Microsoft has a free download that enables Office 2003 users to open Office 2007 files, you may experience some disruptions.  Microsoft’s compatibility checker sometimes refers to these issues with a message that includes the phrase ”you may experience some minor loss of fidelity”.  Well, minor is subjective - so here is a listing of issues (from Microsoft TechNet) that you should be aware of.

The next post will describe how to use Compatibility Mode when writing/editing in Word 2007 to proactively avoid these issues. 

Name
Description
Compatibility Mode Behavior
Math
Equation building is new to Office Word 2007.
Equations are represented as non-editable images. These images are refreshed when the document is converted. The Equations UI is disabled in compatibility mode.
Themes
Themes are new to Office Word 2007.
Themes are permanently converted to styles. The Themes UI is disabled in compatibility mode.
Colors (Theme Chunk)
Themes are new to Office Word 2007.
Themes are permanently converted to styles. The Themes UI is disabled in compatibility mode.
Font (Theme Chunk)
Themes are new to Office Word 2007.
Themes are permanently converted to styles. The Themes UI is disabled in compatibility mode.
Effects (Theme Chunk)
Themes are new to Office Word 2007.
Themes are permanently converted to styles. The Themes UI is disabled in compatibility mode.
Content Controls
Content controls are new to Office Word 2007.
Content controls are permanently converted to static text. The Content Controls UI is disabled in compatibility mode.
Tracked Moves
Tracked moves are new to Office Word 2007.
Tracked moves are permanently converted to “Insert” and “Delete.”
Major/Minor Fonts
Major/minor fonts are new to Office Word 2007.
Major and minor fonts are permanently converted to static formatting.
Relative Text Boxes
The ability to set the position of a text box relative to some part of a document. Relative text boxes are new to Office Word 2007.
Relative positioning of text boxes is permanently converted to absolute positioning.
Margin Tabs
Margin tabs are new to Office Word 2007.
Margin tabs are permanently converted to absolutely defined tabs.
Bibliography
New to Office Word 2007.
Bibiliographies are permanently converted to static text.
Citations
New to Office Word 2007.
Citations are permanently converted to static text.
Placeholder text
New to Office Word 2007..
Placeholder text is permanently converted to static text.
Office Art 2007
Office Art engine is improved upon in the 2007 Office release.
All Office Art 2007 objects are converted to Office 97–2003 objects. These objects are refreshed when the document is converted. When a user selects SmartArt in Office Word 2007, the Diagram Gallery from Word 2003 appears.
SmartArt Diagram
Some diagrams are new to the 2007 Office release.
Diagrams in the 2007 Office release are converted to non-editable images. When the document is converted, these images are refreshed to 2007 Office release again. When a user selects SmartArt in Office Word 2007, the Diagram Gallery from Word 2003 appears.
Custom XML Data store
New to the Open XML Formats, custom-defined XML information can be stored as a separate component within the Open XML Formats, to help organizations include content from their own data sources, using their own languages.
The XML data store is removed during conversion, and XML data and content within XML bindings are permanently converted to text.
Vertical Text Box Alignment
Vertical text box alignment of center or bottom are new to Office Word 2007.
Vertical text box alignment of center or bottom is permanently converted to top vertical text box alignment.
Office Charts
Charts can now exist as native objects in Office Word 2007.
Office charts are converted to Excel OLE objects. These objects are refreshed when the document is converted back to 2007 full functionality mode. When a user selects Charts in Office Word 2007, the Diagram Gallery from Word 2003 appears.
ActiveX
Active X controls can be added to Word documents to deliver enhanced functionality.
Disabled ActiveX controls are converted to their image representation when saved to a downlevel file type or opened in a downlevel application verison through the converter.

Katriel 

How much does excellent documentation save your company?

July 31st, 2007

Advertising Age reports yesterday that 20 MINUTES is “the average amount of time a consumer spends trying to set up a device before giving up”.  Figure out how many returns or support calls that generates for your company, and figure out how much those returns or support calls cost your company, and you have your business case for better documentation!
Katriel

Word 2007 - staying oriented in the new “ribbon” interface

July 15th, 2007

IMHO, the Word 2007 ribbon interface is a big improvement.  However, it takes time to get your bearings.  (I’ve been using Word 2007 full-time for about 9 months, since the beta period, and I still find myself scratching my head trying to remember where to find a particular function that I could find in my sleep in earlier versions.) 

 So — you may want to check out the Get Started tab (shown below). Download from the Microsoft site.

You can also download a workbook from the Microsoft site that lists the locations of Word 2003 commands in Word 2007. Recommended!
Katriel

“A bit hazy on the difference between XML and XSD”

July 15th, 2007

A.S., a loyal reader, writes, “I’m a bit hazy on the difference between XML and XSD”. Well, hopefully this post will clarify the issue for you.A schema (XSD) describes what must be in the XML document.  For example, it might say that every item must have one catalog number, and one name, but may have one or more sizes (e.g. 500 gram and 750 gram).

The schema (XSD) describes what must be in the XML document.  For example, it might say that every item must have one catalog number, and one name, but may have one or more sizes (e.g. 500 gram and 750 gram).

The XML document would list what’s in the catalog.  For example:

100
Corn Flakes
500
700

200
Bran Flakes
500
750
1000
1250

In the above case, the schema (XSD) would declare the XML file invalid if it had no catalog number – or if it had 2 or more catalog numbers.
Katriel
BTW, DITA processors generally use DTDs rather than XSDs, but that’s another post.

Word 2007 - the file format

July 15th, 2007

Word 2007 creates, by default, ”.docx” files rather than “.doc files”.  If you need to share .docx files with users of earlier versions of Word, you can save as .doc files.  If you do not have Word 2007 but have received .docx files — no need to worry.  Just download the compatibility pack from Microsoft — allows older versions of Word to open .docx files.

When saving as .doc files Word will warn you about any features that are likely to be problematic.  In my experience to date, Word has been conservative — warning about relatively minor problems.
Katriel

“I deliver help, and already have a HAT, why would I possibly benefit from DITA?

July 9th, 2007

Avi, a loyal and critical reader, asks “I already have a suitable tool (RHX5), what could I possibly benefit from… ”

Well, Robo Help is certainly a reputable tool.  And, if it works for you, then remember the first rule from Engineering 101: “If it works don’t fix it”.

But if you need to deliver content in multiple channels (PDF), if you need to tailor content for specific audiences, if you want to reuse content for different needs (implementation, training, user guide, troubleshooting, support, etc.), if you need to cut down on translation costs… then IMHO you should be thinking seriously about DITA.
Katriel

P.S. We have posted a new white paper: Find out why DITA matters and what’s in it for you.

Winston Churchill and DITA

July 5th, 2007

Winston Churchill on the cover of Life. The greatest enemy of a great technical writer solution is the dream of a perfect solution.

Is DITA perfect? No. Is it as easy as just writing without thinking about structure, just as we might in Word or FrameMaker or an HTML editor? No. Do writers need to learn how to think in topics and use a DITA editor? Of course.  Is there friction in the move to DITA? Absolutely.

This being said, we should paraphrase Winston Churchill: “It has been said that DITA is the worst approach to technical documentation except all the others that have been tried.”  Go for it - we technical communicators have nothing to fear but inaction!
Katriel