What is the best unit? Of course there is no best unit. But for some purposes some units are better than others.
In digital documents, there is often a choice of units with which to specify absolute lengths. CSS, SVG and ODF have a choice of inches (in), centimeters (cm), millimeters (mm), picas (pc), points (pt) and pixels (px). Editing files with different computer programs or different versions of programs can lead to mixed use of units.
For example, when saving ODF with LibreOffice, the unit that is used
for storage depends on the user preferences. This can lead to
inconveniences and rounding errors. If I specify a margin of
1.25cm
and send it to someone who has the preferences set
to use inches, the margin will be stored as 0.4925in
. When
that number is converted back to centimeters, the value is
1.25095cm
which is 1‰ more than the original value.
In addition to the loss of precision, consider the user interface.
That shows a weird value now: 1.25095cm
. The lengths in
office documents are usually not measurements; they are nice numbers
chosen by users. Most people choose lengths to be round numbers in the
particular unit they are currently working with. 0.3175cm
and 9pt
are exactly the same length, but only one
representation is likely to be the one that was chosen by a user.
These conversion problems can be avoided by decoupling the units used in the user interface from the units used when saving.
Here is a small table with the exact multiplication factors for conversion between the length units:
in |
cm |
mm |
pc |
pt |
px |
|
---|---|---|---|---|---|---|
in |
1 |
2.54 |
25.4 |
12 |
72 |
96 |
cm |
50/127 |
1 |
10 |
600/127 |
3600/127 |
4800/127 |
mm |
5/127 |
0.1 |
1 |
60/127 |
360/127 |
480/127 |
pc |
0.25/3 |
0.635/3 |
6.35/3 |
1 |
6 |
8 |
pt |
0.125/9 |
0.3175/9 |
3.175/9 |
0.5/3 |
1 |
4/3 |
px |
0.03125/3 |
0.079375/3 |
0.79375/3 |
0.125 |
0.75 |
1 |
The green multiplication factors give lossless conversion. The red multiplication factors give lossy conversion for decimals. Let’s look at the conversion between inches and centimeters again. Conversion from inches to centimeters is lossless. The conversion with 2.54 will add two extra digits, but the resulting value can be written in decimal representation and is always exactly the same length.
Conversion in the other direction, from centimeters to inches, is problematic. The centimeter value is divided by 127, which leads to an infinite sequence of digits for most values.
The column for conversion to inches is almost entirely red: writing a length out in inches nearly always leads to a lossy conversion.
There is no single unit that can save any length losslessly. But a combination of cm and pt does cover all cases. Any length can be written out without loss of precision by using either cm or pt.
Recently, I’ve developed an urge to normalize ODF files. ODF, expecially in the flat file format (a single XML file), is convenient for writing texts that are stored in version control systems. It’s like plain text but with added features like bold, italic, named styles, tables and mathmatical formulas. ODF is also the best file format you can hope to get from a non-programmer.
Programmers like to put files in revision control systems and look at diffs between different versions of a file. This is where normalization is needed. Office applications are really complex and each one has its own peculiarities in saving files. These peculiarities have no influence on the semantics of the documents but can make a diff between two versions larger than needed.
The solution is to remove the peculiarities. This process is called normalization. Normalization is hard to perfect, but even imperfect normalzation is helpful.
One step in normalization is to standardize on the unit that is used for lengths. The conversion should be lossless. The unit that is used carries no meaning in ODF, but the length itself should be transferred exactly. For normalizing lengths in ODF I’m currently using the following logic: if a length can be converted losslessly to centimeters, do so, if not, then use points. In this way, two documents with different origins will have stored equal lengths as equal character sequences.
Comments
Post a comment