August 28, 2015

Hugo Steinhaus, or K-means clustering in French

Kernel clustering
[Modern transcription of the Hugo Steinhaus paper in 1956 (in French), at the source of k-means clustering algorithms, published first in a french-written post]

Data clustering or clustering analysis belongs to statistical data analysis methods. It aims at forming groups of objects that are similar in some way. Those groups are named clusters. The word cluster is related to clot, for thick mass of coagulated liquid or of material stuck together

The whole set of objects contains heterogeneous data, that ought to be grouped into subsets possessing a greater inner homogeneity. Such methods rely on similarity criteria or proximity measures. They are related to classification, machine learning, segmentation, pattern recognition, and have applications ranging from image processing to bioinformatics.
One of the most popular clustering method is known as K-means (k-moyennes in French). with a variation called dynamic clustering (beautifully called nuées dynamiques in French, for an application in bilogy: Kinetic transcriptome analysis reveals an essentially intact induction system in a cellulase hyper-producer Trichoderma reesei strain, Dante Poggi et al., Biotechnology and Biofuels, 2014).

An history of K-means can be found in Data Clustering: 50 Years Beyond K-Means, Anil K. Jain, Pattern recognition letters, 2010. Other historic bits can be found in Origins and extensions of the k-means algorithm in cluster analysis, Hans-Hermann Bock, Electronic Journ@l for History of Probability and Statistics, 2008. This algorithm is deeply linked to Lloyd-Max algorithm, developed by Lloyd in  1957, and rediscovered by Max three after after. It is useful for optimal scalar quantifier design.

Sur la division des corps matériels en parties (pdf)

The K-means technique is a little older. It was published in French by Hugo Steinhaus in 1956 in the Bulletin de l’Académie Polonaise des Sciences (Bulletin of the Polish Academy of Sciences). Hugo Steinhaus (1887-1972) is a Polish mathematician, sometimes known as the discover of Stefan Banach. He contributed to numerous branches of mathematics, and considered a early founder of probability theory.

He also contributed to applied mathematics, working jointly with engineers, biologists, physician, economists, geologists, and "even lawyers". Lacking of trustworthy information during World War II, he invented a statistical tool to estimate German losses, using necrologic news from German soldiers on the front. He notably used the mention that the soldier killed was the first, second or third child from a family. He thus is a precursor of data science.
This paper is called "Sur la division des corps matériels en parties" (On the division of material bodies into parts). The first to explicitely  formulate in finite dimension the principle of k-mean clustering. It is much more constructive that the Banach-Tarski paradoxal theorem which delas with cutting a ball into two different balls, doubling in volume. This paper is pleasant to read, and evokes practical uses from type classification in anthropology to industrial object normalization.

Being in French, in a journal whose archives cannot be accessed easily, it has not been read as much as it deserved. . Here is it transduced by Maciej Denkowski, transmitted by Jérôme Bolte, and transcribed in LaTeX by  myself, with some effort to preserve the original typesetting and composition.

@Article{Steinhaus_H_1956_j-bull-acad-polon-sci_division_cmp-k-means,
  Title                    = {Sur la division des corps mat\'eriels en parties},
  Author                   = {Steinhaus, H.},
 File                     = {Steinhaus_H_1956_j-bull-acad-polon-sci_division_cmp-k-means.pdf:Steinhaus_H_1956_j-bull-acad-polon-sci_division_cmp-k-means:PDF},
  Journal                  = {Bulletin de l’Acad\'emie Polonaise des Sciences},
  Number                   = {12},
  Pages                    = {801--804},
  Volume                   = {Cl. {III} --- Vol. {IV}},
  Year                     = {1956},

  Owner                    = {duvall},
  Timestamp                = {2015.07.07.15.44}
}








5 comments:

  1. Interesting subject, loved that you kept (very much successfully) the original typesetting.

    ReplyDelete
  2. That you very much. It was not that easy, especially due to the fine spaces or the big E on line 5, which I have not been able to reproduce.

    ReplyDelete
  3. Thank you very much Laurent! Est-ce qu'il s'agit bien du papier original? J'ai cru que la première version publiée était en 1951 es s'appelait "Sur la liaison et la division des points d'un ensemble fini".
    http://matwbn.icm.edu.pl/ksiazki/cm/cm2/cm2145.pdf
    Sais-tu pourquoi Steinhaus a publié en Francais et pas en allemand ou polonais?
    Merci d'avance!

    ReplyDelete
  4. Thank you very much Laurent! Est-ce qu'il s'agit bien du papier original là?
    J'ai cru que la première publication était en 1951 et s'appelait "Sur la liaison et la division des points d'un ensemble fini"
    http://matwbn.icm.edu.pl/ksiazki/cm/cm2/cm2145.pdf
    Sais-tu pourquoi Steinhaus a publié en Francais et pas en allemand ou polonais?
    Merci d'avance!

    ReplyDelete
  5. I was lucky enough to get the [original paper scanned](http://laurent-duval.eu/_Share/Science/Steinhaus_H_1956_j-bull-acad-polon-sci_division_cmp.pdf) from a Poland library by Maciej Denkowski, after talks with Jérôme Bolte in 2015. The link above is the scan, for the blog entry I only retyped it as faithfully as I could in LaTeX. Merci pour le cadeau, je vais revisiter l'histoire que je croyais connaitre. Pour l'histoire, il semble qu'il y a eu une ancienne amitié culturelle et scientifique entre la Pologne et la France. Par exemple, on trouve sur [Wiki: Académie polonaise des sciences](https://fr.wikipedia.org/wiki/Acad%C3%A9mie_polonaise_des_sciences) "Le Centre scientifique de l'Académie polonaise des sciences à Paris (pl)10 (Stacja Naukowa PAN w Paryżu 11) trouve ses origines le 3 mai 1893, dans le cadre de la Bibliothèque polonaise de Paris et de la Société historique et littéraire polonaise". Une partie de cette histoire est racontée dans "L'Europe mathématique : Histoires, Mythes, Identités / Mathematical Europe: history, myth, identity", édité par Catherine Goldstein, Jeremy Gray, Jim Ritter

    ReplyDelete

Forfait Joule : grandeurs et unités du temps de travail

[Article très provisoire, pour arriver à la notion professionnelle de "forfait Joule" #ToutCaPourCa] Quand on parle de travail que...