2011-02-24

The inelegance of the SI (or Ten Things I Hate About SI ;-)

Our globalised world needs a single, user-friendly measurement system so we can all communicate clearly about our physical world (health, food, energy, environment, infrastructure, transport, etc).

The International System of Units is conceptually elegant, however its notation is a historical mess, inherited from 150 years of irregularities in the metric system. It is also an administrative mess: the units and notation in the SI Brochure 8 (2006) (the primary scientific authority) ('SI8' here) and ISO 80000-1:2009 (the primary engineering authority) are inconsistent.

Consequently, the SI's clarity, usability and utility are impaired:
  • the subtleties and irregularities of the SI notation are difficult to understand and remember
  • SI expressions are unnecessarily difficult to write on a keyboard
  • SI expressions are problematic to display in a web page and transfer between computers.
  • SI expressions are syntactically ambiguous, so cannot be parsed by a software style-checker or  mathematical processor
These problems result in the SI being misused by writers/journalists, which contributes to public misunderstanding and more poor SI usage. The SI specification and notation should be made as simple and clear as possible, if only to encourage the citizens of the USA to adopt it! Here are the problems in the recent versions of the SI (SI6/7/8 and ISO 1000/80000), trivial and significant; some are fixable, for some it is too late. [Suggested fixes are in italics]

0. The two international SI specifications are inconsistent
SI8 and ISO 80000:
(a) imply different interpretations about expressing a quantity
SI8: "the value of a quantity is the product of the number and the unit, the space being regarded as a multiplication sign" [i.e. the space is explicitly the number-unit product symbol]
ISO: "The symbol of a unit shall be placed after the numerical value in the expression of a quantity, leaving a space between the numerical value and unit symbol" [i.e. the space is a formatting convention]

(b) specify different unit-unit product symbols
SI8 5.1: "Multiplication must be indicated by a space or a half-high (centred) dot (⋅), since otherwise some prefixes could be misinterpreted as a unit symbol."
ISO 80000 7.2.2: illustrates the allowed product symbols as "N ⋅ m" (with spaces), "N m" and "Nm", the latter being an implicit multiplier that can be used in certain combinations.

(c) specify different "non-SI units allowed for use with the SI" -
SI8: min, h, d, °, ',", ha, L/l, t, eV, u/Da, ua
ISO: min, h, d, °, ',", L/l, t, eV, u

What on Earth do ISO TC/12 and BIPM CCU think they are doing by running separate unit fiefdoms and versions of the mother of all international conventions?
[Fix: these specification are reproduced and interpreted in many secondary documents and in legislation - ISO 80000 should be made identical with SI8 urgently. ]


1. The implicit-product rule in ISO 80000 is confusing and widely misunderstood
ISO 80000 7.2.2: illustrates the allowed product symbols as "N ⋅ m" (with spaces) and "N m" and notes "The latter form may also be written without a space, i.e. Nm, provided that special care is taken when the symbol for one of the units is the same as the symbol for a prefix. This is the case for m, metre and milli, and for T, tesla and tera."
This clause is unclear (what does "take special care" mean?) and incorrect since only the leading unit needs to be unique (e.g. the symbol "hW" is ambiguous; the symbol "Wh" is unambiguous). It should simply state "except when the symbol for the first unit is the same as the symbol for a prefix". Consequently, the rule is widely misunderstood or ignored, resulting in many ambiguous SI expressions, such as "mAh" used to designate the capacity of household batteries.
It is also bad practice to illustrate the multiplier symbol "N ⋅ m", instead of describing it - from Acrobat Reader or a print copy, it is impossible to tell if it contains spaces, and this has a large impact on the parsability of an SI quantity expression.
[Fix: in any case, the ISO 80000 implicit multiplier should be discarded, as per SI 8, allowing humans/software to unambiguously identify both the prefix and unit symbols in a derived unit. The middle-dot multiplier should be explicitly stated]


2. The implicit-product rule in ISO 80000 causes ambiguities
The implicit product symbol combining two unambiguous unit symbols, can still form an ambiguous symbol; e.g. "lm" is ambiguous, representing both lumen and litre-metre.
[Fix: use an explicit multiplier to disambiguate this: lm is lumen, l·m is litre-metre.]

3. The unit-unit product symbol has too many options
The choice of three product symbols in ISO 80000 is bound to confuse users.
In SI 8, the allowed symbols are "N · m" and "N m", which reduces some of the confusion.
[Fix: there should be an explicit single representation of the product symbol, analogous to the SI's single representation of the divisor ("/"), allowing humans/software to unambiguously identify both the prefix and unit symbols in a compound unit. ]

4. The number-unit product symbol is not explicitly stated in ISO 80000
The representation of the multiplier between the numerical value and the unit is not explicitly specified in ISO 80000. Clause 7.1.4 states "Unit symbols...shall be placed after the complete numerical value in the expression for a quantity, leaving a space between the numerical value and the unit symbol." This reads more like a formatting suggestion than the international convention of quantity calculus. Consequently journalists often incorrectly concatenate the number and unit. The explicit specification of a number-unit multiplier would reveal the first axiom of quantity calculus (i.e. that a quantity is the product of a number and a unit).
[Fix: ISO 80000 should simply and explicitly state that the number-unit product symbol is a space.]

5. Some SI symbols are duplicated.
The characters "d", "h","m" and "T" are used both as unit symbols (day, hour, metre, tesla) and prefix symbols (deci, hecto, milli, tera). This can cause confusion, but does not cause ambiguity in SI 6/7/8 and in ISO 80000 if rule 7.2.2 is followed (i.e. if these symbols are not used before an implicit product symbol).
[Fix: none, it's too late]

6. The space and full-stop are ambiguous symbols
Since these symbols have other meanings in a SI expression, this makes the computer parsing of SI expressions problematic. For example in the ISO 1000 expression "1.234 567 x 103 N.m/s", the space represents a digit separator, two scientific notation separators and the number-unit multiplier; the full-stop represents a decimal point and unit-unit multiplier.
[Fix: the space should be reserved as a separator, and the full-stop reserved as a decimal point. ]

7. A prefix symbol plus unit symbol forms another unit
E.g. in SI6/7 where “Pa” can be “petaare” or “pascal”, and ISO 1000/80000 and SI 6/7/8 where “cd” can be “centiday” or “candela”)
[Fix: change the symbol for day to "day"]

8. A prefixsymbol  plus unit symbol forms another prefix
In SI 6/7 where “da” can be “deciare” or “deca”).
[Fix: make all prefix symbols single-character - change "da" to "D"]

9. Four SI symbols are not on a U.S. keyboard
Three of the 49 SI prefix/unit symbols (μ, Ω ,°C) are (inconsistently) not Latin characters. None of these three symbols, nor the multiplier symbol "half-high dot" (·) are represented on the U.S. or Western-European keyboards. As the vast majority of scientific writing is in English/European languages, this creates unnecessary delays and difficulties for authors in generating these characters in different software by the use of special codes, key combinations or menus. For example, in this HTML document, the required "character entity" codes are #mu, #Omega, #deg, #halfdot; this is hardly intuitive or simple for an author. If the symbols are copied from a Microsoft Word document - a common work-around - they are usually rendered incorrectly by non-Microsoft browsers, e.g. Mozilla FireFox renders "mu" as "m" and omega as "W". How does your browser render "m" and "W"?
[Fix: rename these to "u" and "Ohm" . Use a keyboard character as the multiplier symbol - the asterisk is the universally recognised keyboard character for multiplication.]

10. These four symbols are not represented in common character sets
These symbols are not represented in the foundational 7-bit ASCII. The Greek character omega "Ω" also has no character in the ubiquitous ISO-8859-1 (Latin-1). One day, all operating systems and application programs will use that mother-of-all-character-sets Unicode; meanwhile these omissions can cause errors for software in rendering the symbols across different computer platforms.
[Fix: adopting fix #1 will ensure these are in the widely-recognised ASCII character set]

11. The kilogram problem
The base unit of mass (kg) has a prefix, so one cannot form multiples by adding another prefix, e.g. "kkg" for tonne.
[Fix: many pedants suggest using the original French name Grave, or Giorgi or Galli, symbol "G"]

12. The prefix names are inconsistent.
Eight of the ten multiple prefixes are suffixed with -a, except for kilo and hecto. Seven of the ten submultiple prefixes are suffixed with -o, except for deci, centi and milli.
[Fix: rename these prefixes to kila, hecta, deco, cento, millo]


13. The prefix symbols are inconsistent.
Nineteen of the twenty prefix symbols are single-character, except for "da" which has two characters. This make it difficult/impossible for software to parse and implement SI prefixes.
[Fix: change the symbol for deca to "D"]


14. The prefix symbols have mixed case.
One would expect in a consistent and elegant system of units, the prefixes for multiples would be UPPER CASE and the prefixes for submultiples would be lower case. No such luck. The prefixes kilo (k), hecto (h) and deca (da) are all lower case.
[Fix: change the symbols to "K", "H" and "D"]


15. The prefix symbols are not systematic.
One would expect in a really consistent and elegant system of units, the prefix for a multiple would be the UPPER CASE of the corresponding submultiple. This would reduce the number of prefixes to be learnt to 10. Belatedly in 1991, the 19th CGPM partially adopted this obvious idea and declared zetta (Z=1021)/zepto (z=10-21) and yotta (Y=1024)/yocto (y=10-24).
[Fix: none, it's too late]


16. Uneven distribution of letters
There is a very uneven distribution of initial letters used for unit symbols/prefixes. "S/s" (s, sr, S, Sv), "C/c" (C, °C, cd, c), "M/m"(m, min, mol, m, M) and "K/k" (kg, K, kat, k) are overloaded, which is a recipe for human confusion. The letters "b, D, e, g, i, I, j, o, O, q, Q, R, t, u, U, v, w, x, X" are unused.
[Fix: none, it's too late]


17. The principal symbol for the unit-unit multiplier is unknown outside science
The half-high dot (·), the principal symbol for multiplier in a compound unit, is a specialised mathematical symbol unknown to the news media and magazines (e.g. who typically write "MWh" for electricity consumption), and so is unfamiliar to non-scientific users.
[Fix: change the symbol to an asterisk, the universal keyboard character for the multiplier operator]


18. ISO 1000 is inconsistent regarding alternative symbols
The ISO 1000 specification illustrates the multiplier in a compound unit as "N · m" and notes this may be "written with a dot on the line (N.m) instead of a half-high dot for systems with a limited character set". This inconsistent concession is of limited use since no such allowance is made for the Greek characters or degree symbol.
[Fix: adopting fix #4 means that alternate multiplier symbols are unnecessary]


19. The specification omits - use of prefixes
There is no explicit statement about which of the “units accepted for use with the SI” takes prefixes. ISO 1000 clause 7.2 states "prefixes given in table 4 may be attached to some of the units given in table 5 and table 6". SI 8 is similarly vague.


20. The specification omits - numbers in a compound unit
An axiom of quantity calculus is that base units may be algebraically combined to form derived units, but the SI specifications do not explicitly prohibit the use of numbers in a derived unit, although this is surely the intention. Consequently, there is widespread incorrect usage of certain conveniently-sized compound units (e.g automotive fuel consumption as L/100 km.


21. The specifications omit - combining multipliers
There is no rule about combining different multipliers in a compound unit (e.g. is "kg m2/(s2·K mol)" acceptable for molar entropy?)


22. The specifications omit - typography of spaces
The font for quantities is italic, and for units is upright (e.g. 'm' means 'mass'; 'm' means 'metre'). The typography of spaces (i.e. the space width between between groups of numbers, between numbers and units, and between units) is unspecified. In the font used in ISO 80000, the multiplier "half-high dot" appears to be separated by spaces while the alternate symbol, the full stop, is not. The SI brochure (5.3.4) specifies a "thin space" as a separator between groups of three digits whereas ISO 80000 does not.


23. Both specifications misname the product symbol
In ISO 80000 and SI 8, the unit-unit product symbol is referred to as "the half-high dot". There is no such character in ASCII, ISO-8859-1 or Unicode. Presumably this is the Unicode character "middle dot" (U+00B7); or is it the Unicode "dot operator" (U+22C5)?

Summary of fixes to build a syntactically unambiguous SI-notation
1. ISO 80000 should be made identical to SI8
2. Change the unit symbol 'd' to 'day'
3. Explicitly require that derived units containing numbers (e.g. fuel consumption 'L/100 km' ) are written in standard mathematical  notation, i.e. 'L/(100 km)' or  'L/100/km' or 'L/km/100'.

Summary of fixes to build a more user-friendly SI-notation

1. Specify a single unit-unit product symbol, the keyboard symbol "*"
2. Change the prefix symbols "da, h, k" to "D, H, K"
3. Change the Greek symbols μ and Ω to (Latin) keyboard symbols "u" and "Ohm"


[drafted 2008-05-22; rev 2011-04-14]

No comments: