Which language provides the most random alphabetically sorted sequence?
Data
| N | Eng | Dut | Ger | Tur | Chi | Lex |
|----+-----+-----+-----+-----+-----+-----|
| 1 | 8 | 8 | 8 | 6 | 8 | 1 |
| 2 | 11 | 3 | 3 | 5 | 2 | 10 |
| 3 | 5 | 1 | 1 | 1 | 9 | 11 |
| 4 | 4 | 11 | 11 | 9 | 6 | 12 |
| 5 | 9 | 9 | 5 | 4 | 3 | 2 |
| 6 | 1 | 10 | 9 | 2 | 4 | 3 |
| 7 | 7 | 12 | 6 | 10 | 7 | 4 |
| 8 | 6 | 2 | 7 | 11 | 10 | 5 |
| 9 | 10 | 4 | 4 | 12 | 12 | 6 |
| 10 | 3 | 5 | 10 | 8 | 11 | 7 |
| 11 | 12 | 6 | 2 | 3 | 5 | 8 |
| 12 | 2 | 7 | 12 | 7 | 1 | 9 |
Sourced from comments in thread (English from image, Dutch from [email protected], German from [email protected] , Turkish from some rando, Chinese from [email protected], Lexicographical from [email protected])
Plot with Correlation Scores
We will compute the pearson correlation (r-statistic) score by comparing the base number (column 1) with the corresponding language column. We will also compute the Serial correlation, by creating staggered columns that measure how close a number is in a sequence to the one before it.
Staggered Table
cat alphabetic.tab \
| awk '{print $0"\t"prE"\t"prD"\t"prG"\t"prT"\t"prC"\t"prL;prE=$2;prD=$3;prG=$4;prT=$5;prC=$6;prL=$7}' \
| tee alphabetic.tab.stagger
Plot Code
gnuplot -p -e '
set xlabel "Base Sequence";
set ylabel "Alphabetic";
set xtics 1,1,12;
set ytics 1,1,12;
set title "Alphabetic Number Plot with Correlation Score";
set rmargin 25; set key at graph 1.5,0.9;
set size ratio 0.45;
stats "alphabetic.tab.stagger" using 1:2 name "E";
stats "" using 1:3 name "D";
stats "" using 1:4 name "G";
stats "" using 1:5 name "T";
stats "" using 1:6 name "C";
stats "" using 1:7 name "L";
stats "" using 2:8 name "ES";
stats "" using 3:9 name "DS";
stats "" using 4:10 name "GS";
stats "" using 5:11 name "TS";
stats "" using 6:12 name "CS";
stats "" using 7:13 name "LS";
set label 1 sprintf("%10s %6s %6s", "", "Base", "Stagger") at graph 1.07,0.95;
plot "" using 1:2 with lines lw 3 title sprintf("%10s %+.3f %+.3f", "English", E_correlation, ES_correlation),
"" using 1:3 with lines lw 3 title sprintf("%10s %+.3f %+.3f", "Dutch", D_correlation, DS_correlation),
"" using 1:4 with lines lw 3 title sprintf("%10s %+.3f %+.3f", "German", G_correlation, GS_correlation),
"" using 1:5 with lines lw 3 title sprintf("%10s %+.3f %+.3f", "Turkish", T_correlation, TS_correlation),
"" using 1:6 with lines lw 3 title sprintf("%10s %+.3f %+.3f", "Chinese", C_correlation, CS_correlation),
"" using 1:7 with lines lw 1 title sprintf("%10s %+.3f %+.3f", "Lexicon", L_correlation, LS_correlation)
'
It looks like Dutch has the lowest (near 0) correlation to both the base sequence and it's own staggered sequence, with Turkish mirroring it's staggered randomness somewhat.
The least random alphabetic sequences are English and German.
Updated: Added chinese and staggered analysis.