ENGLISH FOR SOFTWARE LOCALISATION

Justin B Rye [MAIL] 01-Apr-10

(Non-Geek Escape Route)

SECTION A – FOREWORD

Welcome to a reference collection of tips from my documentation reviews on the debian-l10n-english mailing list.

A1: introduction

These notes could go on the Debian Wiki, if it wasn't for the fact that typing a paragraph or two of text into my web browser is enough to remind me that editing text is easier in a text editor.  Besides, I don't want to have to defend my notes against well-intentioned sabotage by people who half-remember some piece of mumbo-jumbo handed down to them by their English teacher; this may be a prescriptive style guide, but it's one primarily designed to help people write the way competent native-speakers really do in the twenty-first century.  The idea is that next time I'm reviewing something claiming to be an unix software that allows to run an own irc-based proxy I'll just be able to point at prefabricated summaries of what's wrong with it.  (Yes, that's how I'm highlighting bad example usages.)

A2: contents

SECTION A – FOREWORD
1: introduction | 2: contents | 3: folklore
SECTION B – VOCABULARY
1: disallowed! | 2: false friends | 3: ambiguities | 4: odds and ends
SECTION C – GRAMMAR
1: relativisation | 2: definiteness | 3: tenses | 4: plurals | 5: modifiers
SECTION D – STYLE
1: dialect | 2: colloquialisms | 3: formalisms | 4: miscellaneous
SECTION E – ORTHOGRAPHY
1: spelling | 2: case | 3: hyphens | 4: ticks | 5: listings | 6: leftovers
SECTION F – CONTENT
1: general | 2: debconf | 3: extended descriptions | 4: synopses
SECTION G – AFTERWORD

A3: folklore

First I'd better get out of the way some grammar folklore rules with no particular basis in linguistic reality.  They have never been real features of the grammar of English as used by even the most universally admired writers – they're delusions propagated by people who want to be able to look down on all the members of the general public who fail to obey their imaginary rules.  We might nonetheless choose to abide by these taboos just to avoid the arguments.

Restrictive Which
English-speakers may introduce restrictive relative clauses (see C1) with either which or that.  The myth is that the ones in which that is used are the only grammatical ones.
Sentence-Final Prepositions
Not only are these completely grammatical, sometimes they're compulsory – here's one for Miss Fidditch to think about.
Sentence-Initial Conjunctions
People claim you can't use words like and or but to start a sentence.  But why shouldn't we?  It seems to have no trace of a rationale.
Singular They
This has been idiomatic since Middle English, and is the only natural way of saying something like I suppose either Alice or Bob must have lost their key.
Split Infinitives
I said I'd decided to slowly stop smoking cigarettes is better than any of the alternatives.

Some prescriptivists insist on usages like none of you knows whom he would choose if he were I, even though they're long extinct in most brands of natively spoken English.  Following their advice is a good way of making yourself sound as if you were brought up on the Lost Island of Snooty Robots.

SECTION B – VOCABULARY

English offers plenty of opportunities for picking the wrong word.  Sometimes it even seems to be systematic about it; for instance, it often presents a three-way choice between -ing noun, plain noun, or -ation noun, all of them more or less synonyms (some counting, a count, a computation).  The -ing words can be tricky to fit into a sentence, since they keep some of their old verbal habits, while -ation words tend to be fancy and abstract.

B1: disallowed!

This one crops up so often I'm putting it right at the top.

You can't allow to do something (as in this option allows to compile code).  You can say that this option allows you to compile code, or this option allows code compilation, or even this option allows code to be compiled; but if there's no object immediately after the verb, it's almost certainly ungrammatical (and the same goes for permit to).  Native-anglophone readers will know what you mean, but they'll also suspect you've got a funny accent.

Besides, unless the software is something like PAM, how likely is it that it literally allows me to do something otherwise forbidden?  It enables or simplifies doing things, or helps me do them, or simply does them.

B2: false friends

Well known cases where the English word doesn't mean what speakers of most European languages expect.

beware of…   when you mean…
actual current
arrive succeed
conscience consciousness
consequent consistent
demand request
especially specifically
eventual random/possible
experiment experience
few several/a few
funny fun
mention give/specify
pretend claim
relative relevant
respective corresponding/appropriate
sensible sensitive

B3: ambiguities

Each of the following words has more than one well established idiomatic meaning, so you need to be aware of the possible misinterpretations.

Archive
Package repositories are archives, but so are individual .deb files (they're ar archives).  There are quite a few technical labels for subdivisions of the Debian archives, including area, distribution, component, release, and section.  Most of them present opportunities for confusion.
Binary
If you mean to include Perl utilities and exclude JPEGs, executables is clearer.  If instead you're talking about Debian binary packages, those are officially so called regardless of whether they contain binary data or ASCII text; even the ones providing kernel source-code count as binary rather than source packages.  Both kinds are accessed via the kind of sources listed in /etc/apt/sources.list.
Console
This is commonly used as the opposite of graphical, but also more narrowly as run in a VT login (like startx).  And then there are console games
Database
The .odb file?  The collection of abstract tables?  The package?  The RDBMS executable?  The process?  The information it stores?  Even MySQL server can be either software or hardware.
Desktop
In software terms, either the virtual workspace presented by a Desktop Environment or the suite of programs used to implement this; in hardware terms, either a specific label for a non-tower form factor or more often a general term for immobile workstations (even if they live under the desk).
Directory
A folder in my file system or an LDAP-style database?  (Oh, and is a file system the storage volume presented as a directory hierarchy under some mount point, like /home, or is it the storage format, like NFS?  But somehow this one never seems to cause trouble.)
E-mail
Is an e-mail an address or a message?  (Compare the IP – an address or a piece of Intellectual Property?)  There's some disagreement over the hyphenation of E(lectronic) Mail; my own fingers insist on e‑mail, but various authorities prefer email, so maybe we should follow them.
Online
Is the online documentation for an online game stored on my /usr partition or their wiki?  This confusion is traditional, but not all traditions are worth preserving.
Orphan
A package without a maintainer (as per Debian Policy) or a stray installed package with no reverse dependencies (see deborphan(1))?  Sometimes that second type is labelled as obsolete packages, but that's the word aptitude uses to mean installed packages with no current version in the archives.
Root
All sorts of things get called root, from directories to servers to windows (and things are even worse for those of us who pronounce route as a homophone).  Always make it clear whether you're talking about the administrative login for my addressbook database or whether you mean the system superuser.

B4: odds and ends

Abbreviations
It's easy to let abbreviations from your native language (p. ex., for example) slip through untranslated.  Resp. (or, worse, BZW) is a particular giveaway: English doesn't have a generally recognised abbreviation for respectively, because we hardly ever use the word.  Most of the time the best idiomatic translation is either or or nothing.
Come to that, even abbreviations that do exist in English may be worth avoiding for stylistic reasons.  Replacing the Latinisms i.e. and e.g. with equivalent English phrases (that is, such  as) can make a text seem subtly less technical, and eliminates the danger of confusing them.
Based
The word -based is often unnecessary padding.  An Ajax-based app is the same as an Ajax app, a network-based connection is a network connection, a Qt-based GUI is a Qt GUI, and so on.
Logins
Is it to login to my PC, to log in to my PC, or to log into my PC?  Well, the noun is one word, a login; but for the verb, since you can log yourself in it must be two words (the same rule applies for backup, breakdown, checkout, logout, lookup, setup, and shutdown).  Then the in to isn't the kind that means into; it's just a coincidental sequence of in and to (compare giving in to temptation), so the form I'd recommend is log in to.
Management
Although admins spend their time maintaining their systems using APT while developers are managing software releases, it's the former activity that's known as package management while the latter is package maintenance!
Wares
All the -ware words are uncountable; that is, there's no such thing as a firmware or several hardwares.  Instead it's treated as a material – some glassware, a piece of malware.  Much of the time if you've written softwares the word you were looking for was programs or applications.  While I'm on the subject, notice that software is installed on a computer, but hardware is installed in a computer.

SECTION C – GRAMMAR

By which I mean an obviously incomplete survey of syntax, morphology, and so on.  If you're looking for apostrophe-pedantry, it's filed under Orthography.

C1: relativisation

English has four basic types of relative clause.

  1. Ones like this, which you construct using which (or who, whereby, or some other WH-word) preceded by a comma.  These are descriptive relative clauses, and only add supplementary, parenthetical information; Germans should be careful not to confuse them with the following.
  2. Ones which are constructed using a WH-word without a comma.  These are known as restrictive relative clauses, on the grounds that they define an identifying characteristic of the entity in question.  The main problem with them is the fanatical which-hunters who want to have them declared ungrammatical.
  3. Ones that you construct using that.  These are another brand of restrictive relative phrase; they have the advantage of not waving a red flag at the pedants, but then again, using that rather than who with a human referent tends to sound a bit stilted to many native speakers (including me).
  4. Ones you construct using no such word.  A third way of forming restrictive relative clauses – lightweight, but often hard to follow.

If in doubt, don't overlook the option of cutting it into two or more separate sentences.

C2: definiteness

Definite versus indefinite versus nothing is far too complicated to explain here beyond the rule of thumb that the definite article the is for when both writer and reader can identify the thing being referred to.

The question of whether it's the file FOO or the FOO file is a similar issue of information management, since the answer is that it's either, depending mainly on what's news and what's background knowledge:

Non-native speakers also tend to have trouble guessing whether to refer back to a previously mentioned idea with this or that.  This can be really difficult (that was an example).

On the other hand, the rule for whether the indefinite article is a or an is clear-cut as long as you ignore the spellings – what you need to know is how the following word is pronounced, and whether it begins with a consonant sound (a laptop, a one-off, a USB device) or a vowel sound (an option, an hour, an xterm).  Unfortunately, there are a few debatable cases, since some of the things we may need to refer to don't have established consensus pronunciations.  Is it a straced process or an straced processAn URL or a URLA mkfs variant or an mkfs variant?  Sometimes the same people even alternate between saying an /etc/hosts file (pronouncing it etcetera-hosts) and a /etc subdirectory (slash-ee-tee-cee).

C3: tenses

There's no room here for a full explanation of the rules of the English tense system (besides, if I said that technically it has a grand total of two tenses I would only confuse people…) but here are some hints for the bits I see causing trouble most often.

Watch out for the subtle distinction between the simple past tense, which marks things as over and done with, and the perfect construction with have, which marks them as having continued relevance (not quite the same thing as being recent).  There are slight differences in usage between dialects, but basically, a warning message saying FOO was broken suggests that it is now fixed; a warning message saying FOO has been broken implies the opposite.

English has a system of sequence of tenses, where past tense marking on a main clause spills over onto subclauses: I said my name was Sam.  This can even happen when the tense mark on the main clause doesn't really indicate past time: I could stop tomorrow if I wanted.

The used to construction, as in I always used to get this wrong, has the annoying quirk of lacking any present tense equivalent – it would be logical if you could carry on with …and in fact even now I'm still using to get it wrong, but alas, natural languages don't run on logic!  What's more, even when you get the grammar right, the used to construction can easily lead to confusion.  For example, the software used to do this would be fine in speech (since the past-habitual marker is pronounced yoost instead of yoozd), but it becomes ambiguous when written down.

Some dialects have complex rules for when you should use shall rather than will, but not mine – grep tells me I only use shall when I'm quoting something that includes the word shall.

C4: plurals

All of the following things are singular in English (or at least, it would be grammatically correct to follow them with is here rather than are here):

Non-count nouns also take singular agreement: all software is fallible, and so is mathematics, or (this side of 1950) data.  On the other hand, a lot of people are here, while Alice and/or Bob can go either way.

Although each politician takes singular agreement, and faces are ordinarily distributed on a one-per-person basis, it's entirely non-satirical to say the politicians showed their facesTheir face would imply it was shared.

Nouns that modify other nouns are usually unpluralisable (just like the adjectives they resemble), so a collection of managers of windows is a window manager collection, not a windows managers collection.  But then again, a conference of managers of events is quite likely to be an events managers conference, and I can't offer any sort of rationale for these exceptions.

C5: modifiers

Most adjectives can occur either before the thing they describe or after a linking verb (lonely Jim is lonely); a few can't (the sole survivor stood alone versus the alone survivor stood sole).  The word own may resemble an adjective, but it isn't allowed to appear in either position without the support of a possessive word.  Its own name is fine, but an own name has to become a name of its own.

Nouns used as adjectives don't behave exactly like natural adjectives.  They pile up immediately before their head noun, never mixing in with the adjectives to participate in phrases like a simple, shell, useful script.

The dangling modifier is another of those deprecated constructions that native speakers get away with all the time: after reinstalling my PC the bug got worse!  Interpreted pedantically, this sentence claims that the bug performed the reinstallation…

SECTION D – STYLE

Matters of style are essentially arguable, but if you don't want my advice, you don't have to ask for it.

D1: dialect

When people say something is bad grammar, what they often mean is that it obeys the grammatical rules of the wrong dialect, which is a stylistic issue.  The real reason for avoiding slangy or dialectal usages isn't that they're inherently bad, it's that they're less universally understood, especially by readers who are themselves non-native-speakers.

As you may have noticed, even though this page is itself written in my usual British-English HTML style, the variety of English it recommends for debconf templates is the one that goes with an en_US locale.  Other Debian subprojects use en_GB, or have no standard – and even in package description reviews we're often better off letting people follow whatever standard they know best rather than forcing them to adopt one they're uncomfortable with.

Educated American English isn't completely homogeneous anyway; and where there's variation we need to avoid confusing or annoying speakers of either variety.  Take for example the unpleasantly ambiguous phrase in case.  For some anglophones, the instruction unplug your PC immediately just in case of a short circuit means conditionally, if and when a short circuit occurs, unplug your PC; for others (including me) it means unconditionally, to avert a short circuit, unplug your PC now.

D2: colloquialisms

Using an informal register has the advantage that it can give a friendly impression; but there's also a risk that this chumminess may be unwelcome in a context where your readers just want you to get on with conveying information concisely and coherently.  Spoken English tends to leave more things implicit, since a real-world context normally makes what you mean instantly apparent.  A classic example of a usage that's frowned upon in formal writing but taken for granted in conversation is the ambiguous use of like: does options like FOO mean those options that resemble FOO, possibly excluding FOO itself, and certainly excluding options unlike FOO?  Or does it mean any arbitrary option, such as FOO?

Colloquial English often uses sequences of independent clauses, you just splice them one after another with nothing to signpost how they fit together, they're called run-on sentences, like this, see?  Constructions like that are deprecated in writing, but often all that's needed to fix them up is a few commas promoted to semicolons.

Addressing the audience directly with second person (you/your) has advantages and disadvantages – see F2 – but first person (I/me/mine/we/us/our) in documentation is generally a bad idea; it's not only informal, it's also confusing.  Is the speaker the upstream author, some random NMUer, or an animated paperclip?

D3: formalisms

An excessively formal register should also be avoided.  Convoluted uses of balanced antitheses within multi-line relative clauses within hypothetical conditionals can be a very concise way of saying something, but they force readers to do extra work to unpack it.  Even when your display of syntactic knotwork is technically perfect, if it bores everybody into skipping that paragraph you might as well not have written it.

Long, elaborate sentence structures can increase the risk of scoping ambiguities: One should not fail to avoid making a foolish error and leave the button unpressed.  On the other hand once you start breaking everything up into bite-size chunks there's the danger you'll introduce referential ambiguities: There's a button above the off switch.  The off switch should be recognisable because it's red.  Press it.

The impersonal pronoun one (as in using this emulator one can play arcade games) almost always strikes me as hopelessly formal; either replace it with generic you or rephrase the whole thing.  Similarly, over-reliance on passive verbs (a test-tube was heated) is generally unpopular.  Contrary to its bad reputation, the passive voice sometimes provides the most natural and direct way of continuing a sentence (walking in the door, I was greeted by my friend Pat, so I went over…); but that's no excuse for saying please note that it is important that the button should immediately be pressed when you mean press it!

Revising a sentence to introduce or eliminate a passive construction is an opportunity for syntactic problems to creep in and leave you with your pronouns pointing at the wrong things:

[ORIGINAL] Once FOO has installed BAR, it should be removed.
[HALF-EDITED] Once BAR has been installed, it should be removed.
[FIXED] Once BAR has been installed, FOO should be removed.

D4: miscellaneous

NOTE that tags saying NOTE are a bad sign.  Documentation is entirely constructed out of strings of notable points, tacked together into (preferably) coherent paragraphs.  If you need to sprinkle it with labels saying READ THIS PART, that probably means it's a bit of a mess.

Gender-neutralising by explicitly saying he or she is often clunky (though not as ugly as telling half the human race they don't count as people).  If you want to avoid breaking the taboo against they with a singular, there are some alternatives that avoid the issue:

Avoid unnecessary redundancy and repetition.  Even if it makes sense to refer to the same thing several times, it's considered poor style in English to use the same word repeatedly unless it's deliberate emphasis.  This rule can cause a lot of trouble if you're trying to describe how users usually used to use useful userspace usage-monitors…

SECTION E – ORTHOGRAPHY

This is the field where I'm most likely to be bossy, since languages and writing systems are two different kinds of thing.  Once there's a community of mother-tongue English-speakers who have grown up talking about less items, complaints from people who say fewer items are pointless – it's one of the ways English is spoken, so it gets to be listed in dictionaries.  But orthographies are artificial rule-systems propagated via schools, and have no native speakers.  If you spell it as fiewer itoms then you're just failing to comply with the standard.

E1: spelling

If you run lintian with all the optional bells and whistles turned on it has checks for quite a few common typos.

Yes, I'm an en_GB-er myself, but US spellings strike me as a clear improvement in the vast majority of cases.  The best known difference is that en_US expands i18n as internationalization, while en_GB mostly uses internationalisation.  However, the OED prefers -ize (as did The Times until quite recently), and there are a few words that are -ise in both systems, including advertise, compromise, exercise, promise, revise, supervise, and surprise.

Other major categories of divergence:

GB   US   Notes
centre center (but always ogre, auger)
colour color (but always glamour, error)
dialogue dialog (but always fugue, Prolog)
mediaeval medieval (but always aerial, query)
travelling traveling (but always felling, feeling)

The un-American spelling programme still exists, as a British word for TV shows and the like, but these days the computer variety is always program.

E2: case

Package synopses are rather like titles, but that doesn't mean they take Lots of Upper-Case Letters; the Developer's Reference recommendation is not to give them extra capitalisation.  This doesn't mean that you should write gNU, though!  We have to distinguish situational capitalisation, imposed by context, from lexical capitalisation, which is part of the spelling of a word.  A normal word can vary from all-lower-case to first-letter-upper-case to all-upper-case depending on factors like whether it's at the start of a sentence or whether it's in a newspaper headline.  But words like GNU or Linus or English involve letters that are inherently upper-case, written that way regardless of context.

Words with intrinsically lower-case characters are rare outside the world of science and technology (where it can mean the difference between millitesla and megaton).  But in IT, strings such as /usr/bin/perl or itsupport@example.org often have to be invoked precisely verbatim, and even strings like http or usb may need to be entered in a configuration file in lower-case.  The same logic is often applied to package names such as awk or gnome, which may be left uncapitalised at the start of a sentence in documentation.  After all, apt-get install Exim4 won't work [my original example there used aptitude, but that will work on Jessie!].  Rather than insist on a stylistic policy for this issue that requires people to agree on some particular obscure analysis, it's safest to advise keeping package names out of sentence-initial position where possible.

Upstream software project brand names are a different matter, and are upstream's decision.  If they call it FOObar or FooBar we should respect the capitals, but if their website calls it the foobar project it's not clear whether they're leaving it unmarked or declaring it uncapitalisable.  Incidentally, does anybody have any idea under what circumstances it's appropriate for Debian documentation to label brand names as registered trademarks?  My own suspicion is that there's never any serious reason for us to put such labels on anything; if we were going to get sued for not saying Microsoft® Windows® it would have happened a decade ago.

One context where I'm happy to see what looks like title-case in a package synopsis is for things like cups, where including the expansion as Common UNIX Printing System makes it easier to see at a glance that it's doing double duty as an explanation for the name as well as a description.

E3: hyphens

Compounds like front end tend to become front-end and then frontend as the term gets used more.  Programmers are often early adopters of new jargon, so there's an unfortunate tendency for documentation to be written in a style that's unfamiliar and offputting for the readers who need it most.  Feel free to talk in your private shorthand on the development mailing list, but try to stick to the more newbie-friendly forms (file system, web server) when you're addressing the wider public.

I know of a couple of gotchas: being online isn't the same as being on line, plaintext is not the same thing as plain text, a username is not the same as a user name, and userspace isn't user space.  You'd think the hyphenated versions would make good compromise candidates, but that rarely seems to work… instead my own rule of thumb is: if Wikipedia still treats it as two words, that's what the average reader probably expects.

Structurally complex noun phrases tend to acquire hyphenation not because they're becoming single words but just to make it easier to distinguish (e.g.) a real-time machine-translation system from a real time-machine translation-system.

Extra hyphens also occur with phrasal modifiers like an easy-to-use application, but here they serve to mark the whole thing as a unit; the hyphens aren't needed when the same phrase appears after a linking verb (it's easy to use).  You might think the same applies to multi-word modifiers made up of adverb plus participle, as in an easily used application, but since these are never structurally ambiguous a hyphen is considered redundant.

E4: ticks

(A cover-term for backticks, apostrophes, and opening or closing single or double quotation marks.)

The rules for apostrophe use are an obstacle course of arbitrary complexities, where errors are usually spell-checker-proof (and the real joke is that they almost never cause ambiguity – we could get along happily with no apostrophes anywhere).  English possessive apostrophes are particularly shambolic.

There's some debate about the use of apostrophes on inflected forms of numbers, acronyms, and so on (GUI's, GPL'ed, 1990's).  Most style guides recommend leaving them out (one OS, many OSs), but this advice isn't widely followed.

The logical style of quotation mark placement, where punctuation is kept outside the bracketing quotes unless it's part of the original text, is prohibited by many US style guides… so let's ignore them in favour of the Jargon File.

And then there's the question of single versus double quotation marks versus fancy Unicode ones.  I personally prefer to stick to ASCII in contexts where users are likely to want to do command-line searches or use copy-and-paste.  I also use the " character by default, reserving the ' character for use as an apostrophe or second-level quotation mark.  Although that's what I learned at school, people tell me it's the American style; and by happy coincidence it's also the style preferred on d‑l‑e, but as long as a given text is consistent I won't object particularly.  (Well… not unless you're using ``TeX'' quotation marks, that is.  Please don't; I'm sure they would get typeset into something beautiful if only they were being post-processed by LaTeX, but sitting there in my terminal emulator they'll just look rubbish.)

Some writers use single quotation marks not to indicate quotations but as an ASCII workaround for tagging verbatim strings – the sort that I'm HTMLising here in a nonproportional font.  Thus for instance they might say that 'remake' is yet another "simple" replacement for 'make'; this is all very well, but trying to apply it consistently would often make text look too fussy.

E5: listings

Lists where some of the items are themselves slightly complex often benefit from being rephrased (and in particular re-ordered) for clarity.  For instance, it supports FOO, BAR, and BAZ with QUUX or QUUX2 is ambiguous in a way that it supports BAZ with QUUX or QUUX2, plus FOO and BAR is not.  Another tactic is to upgrade the separators between list items from commas to semicolons:

[UNCLEAR] spam, bacon and eggs, and spam, eggs, bacon, and spam
[CLEARER] spam; bacon and eggs; and spam, eggs, bacon, and spam

Where a list is organised by bullet points, d‑l‑e has developed a sort of house style.

It features:
 * leading single-indented asterisk (or maybe dash);
 * semicolon at the end of each item;
 * final period (full stop).

However, a simpler approach, less integrated into the surrounding text, is still okay by me as long as it's self-evident what it's a list of.

 * Independent items
 * Asterisks
 * Capitalisation
 * No other punctuation (or not much)

Lists read more smoothly if items are kept structurally parallel – usually all adjective phrases, all noun phrases, or all verb phrases, not a mixture.

Avoid writing them like this.
   o  broken parallelisms!
   o  insufficiently similar;
   o  Don't go together very well

Mind you, if it's only two or three bullet points it might work better as a plain old sentence; lists with sublists are particularly worth flattening.  And although it's important to make it clear whether the list is exhaustive, it's easy to overdo it – there's no need to say some of its features, for example, include (but are not limited to) FOO, BAR, and BAZ, among many others!

E6: leftovers

Ampersands:
Using & within text is considered informal (though for some reason it's okay for Baz & Quux, Solicitors).  Slash as a shorthand for or is often worth avoiding too, since it's easily misinterpreted (compare text/html, TCP/IP, and CVSROOT/config).
Commas:
Commas present different difficulties depending on where you acquired your punctuation skills.  Europeans should beware of excess commas changing the meaning of their relative clauses; native anglophones should bear in mind that splicing paragraphs together with just commas is very informal.
There's a weak consensus in style guides that lists like FOO, BAR and BAZ usually need an extra (serial) comma: FOO, BAR, and BAZ.
Digits:
Lowish integers should usually be written out, while the rest follow LC_NUMERIC=en (999.999 is almost a thousand, and 999,999 is almost a million).  The European tradition of interpreting billion as tera- rather than giga- (and so on) is almost extinct in the UK, but meanwhile we've got tebibytes to worry about.
Ellipses:
The use of (FOO, BAR, ) to indicate an open-ended list may be standard C syntax, but it isn't common in English prose; use an etc. instead of an ellipsis.
Emphasis:
The accepted ugly ASCII stand-in for emphatic mark-up is to tag text as *bold*, _underlined_, or (rarely) /italic/.  Keep it to a bare minimum, though – excessive emphasis is REALLY ANNOYING.
Exclamation marks:
These can occasionally be justifiable, but see above on emphasis!!1!
Question marks:
The use of interrogative forms in debconf prompts is tightly regulated: you're only allowed a question mark if it's Type: boolean.  When you need to turn is the Pope catholic? into something that technically isn't a question, the easiest approach is to transform it into please specify whether the Pope is catholic.
Spaces:
There should be no space before  :,  ?, or  !.  Between sentences we're standardising on one space rather than two, which isn't what I'm used to, but for a start it's more resilient against HTMLification.

SECTION F – CONTENT

This is arguably outside the remit of a localisation mailing list, but while we're reviewing a piece of documentation it makes sense to do some fact-checking and general editing.

F1: general

The setting determines where the dividing line is between things being technical jargon and general knowledge.  TLAs usually ought to be expanded or explained the first time they're used – and if they aren't used more than once, why waste time introducing the abbreviation in the first place?  But that doesn't mean you need to interrupt your DIY Integrated Circuits HOWTO to explain what a P.C. is.

F2: debconf

See the Debconf Spec and the existing Templates Style Guide (now part of the Developer's Reference).

Debconf dialogues should almost never need to mention debconf, or even the installer; these are technical implementation issues that should be transparent to the user.  Besides, mentioning installation in the middle of an upgrade or dpkg-reconfigure run is just confusing.

When you need to give an example hostname, don't give free advertising to myhost.com, randomword.com, or foo.com; use an RFC-compliant one like example.org.

It isn't necessarily appropriate to ask would you like to reconfigure your server? if the reader might be a sysadmin reluctantly following corporate guidelines for software installations on the company's server.  (Second-person pronouns also tend to make life difficult for translators.)  All you know for sure is that it's up to the reader to answer the question should the server be reconfigured?

F3: extended descriptions

See DevRef 6.2.1 to 6.2.3 (and salvaged from the archives, some old guidelines by Colin Walters).

Questions like how the software is implemented and what standards it conforms to can wait.  The basic point of a package description is to announce what this .deb is for – what can it do to solve users' problems and make their lives more fun?

The project homepage is the easiest place to get this kind of text, but don't take that to mean you should just copy it word for word off SourceForge: their blurb isn't designed to convey the same information as a package description.  So diverging from upstream isn't an issue here any more than it's a problem that the man page is different from the FAQ.

Upstream blurbs may involve confusingly divergent specialised uses of terms like distribution or contrib package or (if you're unlucky) free software, and may be full of hard-sell advertising copy designed to compete with some unmentioned proprietary equivalent.  Remember that the interests of our users always take priority over the developer's ego; stick to an objective summary of the software's pros and cons.

Unless it's going in Section: (lib)devel, you should try to avoid developerese; the typical user only wants to know what your application is good for, not how it's implemented.  If libeg-bin is part of EGlib and provides a utility called eg_tool, don't assume that's self-evident, and make sure that the text makes sense as a description of libeg-bin.  If the significance of the name isn't obvious, the extended description is a good place to put an explanation.  (If it's a TLA you may be able to get away with just using the expansion as the package synopsis.)  I'm the kind of user who finds it easier to get a mental handle on a piece of software if its name has some intelligible connection to its function, so I often ask why the name? in d‑l‑e package reviews.  There seem to be quite a few programmers out there who are content to dub their project yix just because that's a quick and easy key-sequence to type on a Dvorak layout, but that label will often be the first aspect of their brainchild that people encounter as they browse through the menus.  Think of it as the most basic starting point of the user interface!

Reimplementations of existing software should be careful not to live in the past, phrasing their descriptions purely in terms of how libfoo-tng was an improvement on libfoo – especially if libfoo2 might have all the same features.  At best, once libfoo-tng succeeds in becoming the standard implementation and libfoo vanishes from the repositories, users will be left relying on software archaeology to work out what purpose your package serves.  And I can never resist pointing out just how eighties the fad of calling things The Next Generation is!  Beware dated content – references to boot-floppies or X11R6 support, game reviews assuming that 3D acceleration is a novelty, and so on.  In fact it's a good idea to avoid claiming that your package is notably modern (in ten years when it's an orphaned relic that text will be an annoyance); say what its features are (e.g. graphical), and let readers make up their own minds about whether that's an advantage.

Too Much Information:

F4: synopses

The balancing act between too little information and too much is particularly hard for short descriptions.  One thing you should usually leave out is the programming language – it might fit in the long description, but it's a waste of space to say that python-pylibpython-mcpython (Section: python) is written in Python.  Use debtags!

The Developer's Reference says that package synopses should be (articleless) noun phrases referring to the package – that is, they should fit the template $PACKAGE provides a/the/some $SYNOPSIS (though the alternative two-part format popular with large families of packages also has explicit DevRef backing).  They should not follow the example of the man pages that base their description line on verb phrases ($BINARY lets you $DESCRIPTION or $BINARY is designed to $DESCRIPTION).  The logic of standardising on noun phrases goes like this:

Apologies for the linguistics jargon (which strictly speaking isn't even accurate – I should be talking about N‑bars, not NPs).  I advise non-syntacticians just to focus on the template approach.

SECTION G – AFTERWORD

I suspect my reviews on the mailing list give the impression I'm some sort of nit-picking dimwit, so please bear in mind that the best way of spotting typos, grammatical ambiguities, missing definitions, and so on is to approach the text from the point of view of somebody who doesn't already know what it's trying to say.  If you find that sort of ignorance annoying, I apologise; but this may be an indicator that you should delegate the task of writing user documentation to others.

If you disagree with me about some point of grammar or style or whatever, don't worry; at the end of the day, it's the maintainer's decision, not mine, and you're welcome to join the mailing list to provide an alternative viewpoint!

Linux Index | Home Page