Maths for librarians

Some of you may have had occasion to run into mathematicians and to wonder therefore how they got that way
Tom Lehrer, introduction to Lobachevsky
Me myself I got nothing to prove
Tracy Chapman, Fast car

Librarianship in the United Kingdom is now a graduate profession: it is necessary (though far from sufficient) to have a degree, whether in librarianship or a more conventional academic subject. Many people, when they've finished laughing, have expressed surprise that entry to the profession needs to be guarded so jealously. This page is a first attempt to indicate how my first degree in mathematics is highly relevant to my vocation as a librarian - and vice versa.

0. Contents

1. Information science and information theory

Terminology can be treacherous. The first thing one learns on a librarianship course is always to append 'information' to its title. Michael Gorman has reported the observation that information science is no more than librarianship practised by men[*]; it has been called a postmodern science, which seems appropriate for a parody of true science. In fact the word is used here in its older and general sense of 'learning', just as 'library science' calques the German Bibliothekswissenschaft; but it is easy to see it as an attempt to steal the prestige of science.

There is indeed a legitimate scientific approach to information, the information theory developed by engineers and mathematicians. The information of an event is defined as -log(p) bits or shannons where p is the non-zero probability of the event and the logarithm is taken to base 2. (It is really a theory of data, rather than information.) Under this definition, an increase in information implies a decrease in uncertainty; it is apparent that the theory cannot deal with the contradictory information that might be present in a library.

A code is defined as a map from sourcewords to codewords (finite strings of symbols). Broadly speaking, any classification scheme is a code in this sense, mapping subjects to classmarks, although there is often more than one classmark for a single document.

Hence it is not strictly essential for classmarks to be uniquely decipherable, even when close classification, or a fully enumerative scheme, allows this; it suffices that only related topics have the same classmark. It is not even desirable for a classification scheme to be instantaneous, that is, allowing instant decoding because no codeword is a prefix of another: this condition would remove a major advantage of hierarchical notation, namely broadening by truncation. A code with as small an average word length as possible is called compact. Coding theory aims for optimal encodings of messages rather than of individual items, so it would be pointful to examine the expected average length of classmarks in different schemes.

See: Dominic Welsh, "Codes and cryptography" (Clarendon Press, 1989).

[*] 'A gifted younger colleague has proposed a definition that deserves to live forever as Koger's Insight: "Information science is librarianship practiced by men."' Michael Gorman, 'A bogus and dismal science or the eggplant that ate library schools', American libraries 21(5) (1990).

2. The Librarian's Nightmare

This is a playful name (which I came across in Brian Stewart's lecture notes on abstract algebra) for the result that any permutation of a finite number of objects can be obtained as a finite product of transpositions. Its title derives from the potential chaos inherent in letting books get out of order, even as small an error as swapped neighbours. The proof, left as an exercise for the reader, is simple and uses induction.

3. Russell's Paradox

The logician Bertrand Russell famously shattered Frege's attempt to put mathematics on a secure logical foundation by communicating to him a simple paradox in set theory. Consider R, the set of all sets that do not contain themselves. Is R an element of R, or not? This is often rewritten as the Barber Paradox: the village barber shaves precisely those men who do not shave themselves; who shaves the barber? It has been suggested that she doesn't need to shave, and indeed this is not a true paradox but a proof (by contradiction) that no such set (or barber) can exist.

A less familiar reformulation is in terms of library catalogues. A keen librarian creates a meta-catalogue listing all those catalogues which do not contain themselves. Does this meta-catalogue merit an entry in itself?

Another famous 'paradox' recorded by Russell is the proof by contradiction that there exists no 'smallest number not specifiable in seven words'. He attributed it to one Mr Berry, a librarian at the Bodleian Library.

See: Francis Moorcroft, 'Russell's Paradox' in The philosopher's magazine 3, 1998.
A.D. Irvine, 'Russell's Paradox' in the Stanford encyclopedia of philosophy edited by Edward N. Zalta.

4. The Library of Babel

It is a commonplace that a monkey provided with a typewriter and an infinite amount of time and bananas will produce the complete works of Shakespeare (indeed, a variorum edition) with probability one. Unfortunately, it will also produce every other finite string, including the text of this web page and other nonsense.

Many people have compared the Internet to a Universal Library, but the Library of Babel, as imagined by Jorge Luis Borges, is a more idealistic concept. Willard van Orman Quine points out that to have a Universal Library, it suffices to own two books, one with a dot and one with a dash. Repeated reference to these books will produce any imaginable text (subject to interpretation). This binary system is the principle on which you are reading these words.

Ian Stewart adopts the idea of the universal library, in symbols, to illustrate the proof of the Banach-Tarski Paradox, the famously counterintuitive result, dependent on the Axiom of Choice, that a solid sphere may be dissected into five pieces which reassemble under rigid motions to form two solid spheres of the same size as the original sphere. Imagine a universal dictionary, the Hyperwebster, which contains all possible strings of letters of the alphabet; this is clearly the universal library contained in a single book.

This Hyperwebster can be dissected into 26 copies of itself simply by separating its contents into all words beginning with A, all words beginning with B, and so on, and identifying Aword, Bword, with word in the obvious way. The dictionary contains - and can be dissected into - 26 copies of itself, plus the individual letters from A to Z. The link from this to solid geometry is explained in Ian Stewart's book.

See: Willard van Orman Quine, "Quiddities: an intermittently philosophical dictionary" (Penguin, 1990), s.v. 'Universal library'.
Ian Stewart, "From here to infinity" (OUP, 1996; revised version of "The problems of mathematics"), pp. 175-176.

5. The fractal dimension of the Dewey Decimal Classification

A fractal is an object of awkward dimensionality (formally, its Hausdorff-Besicovitch dimension is strictly greater than its topological dimension). Characteristically, a fractal is self-similar - it is difficult to tell from internal evidence on what scale you are looking at it. Although I suspect that under the formal definition the Dewey decimal classification is not a fractal, it is designed to be self-similar on various scales, especially through its synthetic features such as "add from" instructions. Further investigation is required.

6. Great mathematical librarians

I am the latest link of an illustrious chain linking the two professions, starting in the classical world. Eratosthenes of Cyrene (276-194 BCE) was the third librarian to take charge of the great collection at Alexandria. Distinction enough, but he famously deduced the circumference of the Earth (as told in a children's book, "The librarian who measured the Earth") and invented the laborious but effective method of listing primes, the sieve of Eratosthenes.

More recently, Leibniz (1646-1716), co-inventor of the calculus, was appointed librarian to the Duke of Hanover. The minor mathematician Charles Dodgson (1832-1898), Sub-Librarian of Christ Church, Oxford, donned the pseudonym Lewis Carroll to write not only Alice's adventures in Wonderland and its sequel "Through the looking glass", but a number of more explicitly mathematically-inclined stories such as "Sylvie and Bruno". Martin Gardner has illuminated some of the hidden mathematics in his annotated 'Alice' and annotated 'Snark'.

Those two great figures Melvil Dewey (1851-1931, inventor of the eponymous classification and instigator of the first library school) and S.R. Ranganathan (1892-1972, who devised the theoretically elegant Colon Classification) were both mathematicians. Another was Samuel C. Bradford (1878-1948), a librarian at the Science Museum in London, who propounded the eponymous law of the scattering of journal literature (described in this page on bibliometrics).


This piece of whimsy was inspired not by the existing academic literature on bibliometrics, informetrics and statistical testing, of which Philip Morse's 'Library effectiveness' is one of the more intimidating examples. Rather it stemmed from a chance remark by Matthew Phillips, Assistant Librarian at Christ Church, Oxford, that a book on "Algorithms for librarians" covering sorting techniques would be a boon to the libraryworld. Some more mathematical whimsy is available in my Sylvester, poetaster, in which the great J.J. Sylvester is unfairly mocked for his delusions of poetic grandeur.

owen@massey.net Sitemap