44

From my experience with modern programming and scripting languages, I believe most programmers are generally accustomed to referring to the first element of an array as index 0 (zero).

I'm sure I've heard of languages other than zsh starting array indexing on 1 (one); it's okay, as it is equally convenient. However, as the previously released and widely used shell scripting languages ksh and bash both use 0, why would someone choose to alter this common convention?

There does not seem to be any substantial advantages of using 1 as the first index; then, the only explanation I can think of regarding this somewhat "exclusive feature" to shells would be "they just did this to show off a bit more their cool shell".

I don't know much of either zsh or its history, though, and there is a high chance my trivial theory about this does not make any sense.

Is there an explanation for this? Or is it just out of personal taste?

Kusalananda
  • 333,661
deekin
  • 620
  • 7
    Some historical research (or rantish ruminations?) on the topic of 0 versus 1: http://exple.tive.org/blarg/2013/10/22/citation-needed/ – thrig Dec 30 '15 at 18:32
  • For historical reason, that's probably came from csh, which also used one-based array indexing. – cuonglm Dec 30 '15 at 18:45
  • sh has no array, or rather it has one array $@ that does start at 1 not 0 – Stéphane Chazelas Dec 30 '15 at 21:14
  • I don't know what type of /bin/sh I'm on, but I guess it's not vanilla because I'm able to declare arrays and they start at 0. I'm on Arch Linux. Was it updated to support arrays? Or does Arch replace /bin/sh with something like ash or dash? – deekin Dec 30 '15 at 21:23
  • readlink /bin/sh – mikeserv Dec 30 '15 at 21:24
  • TIL. Well, seems like pacman -S sh also redirects to bash. They must really hate that poor ol' shell. – deekin Dec 30 '15 at 21:33
  • 3
    nowadays, sh is a standard language (not an implementation) that has different possible interpreters. Some of those interpreters for the sh language like bash, ksh and yash support arrays as extension, but they are not part of the language just like gcc, a compiler for the standard C language supports extensions over the standard C language. And just like for C, there is no "official" implementation of a "sh" interpreter. – Stéphane Chazelas Dec 30 '15 at 21:34
  • @StéphaneChazelas Thanks for drawing an analogy between the two and clarifying a bit. – deekin Dec 30 '15 at 21:40
  • 1
  • 4
    Maybe a bit off topic, but relevant. The Romans used inclusive counting, staring from one instead of zero.The day after tomorrow, to us "two days ahead", was to them "three days ahead." They counted today as one, not zero. As a consequence when their Egyptian astronomers recommended a leap day every fourth year, the Romans actually introduced it every third year, starting in 45 BCE. It took until 12 BCE for the error to be corrected. – Harry Weston Jan 01 '16 at 12:11
  • Also note that historically, programming languages were invented for solving mathematical problems, and mathematician more often count from 1 than zero. FORTRAN pioneered with 1-base indexing. APL too (but had the option to start all indexes at 0, which could give funny effects when using a library). Some languages (PL/1, Ada,..) let you define for each array separately, where the index starts. While overall speaking, zero-based indexing seems to be more popular now, I wouldn't call it a "standard". – user1934428 Dec 20 '17 at 08:16
  • So let's take 20 years of every Linux distro (sans a few) providing bash as the default shell where generations of scripts have been developed indexing arrays (and string indexing for that matter) and suddenly break all those existing scripts with a shell that doesn't start counting at 0 like every computer on earth does. (which is probably one of the primary reasons zsh is relegated for use for Live CD installs, etc..) – David C. Rankin Jun 07 '19 at 14:17

2 Answers2

50
  • Virtually all shell arrays (Bourne, csh, tcsh, fish, rc, es, yash) start at 1. ksh is the only exception that I know (bash just copied ksh).
  • Most interpreted languages at the time (early 90s): awk, tcl at least, and tools typically used from the shell (cut -f1-3, head -n 3, sort -k1,3, cal 1 2015, comm -1) start at 1. sed, ed, vi number their lines from 1...
  • zsh takes the best of the Bourne shell and csh. The Bourne shell array $@ start at 1. zsh is consistent with its handling of $@ (like in Bourne) or $argv (like in csh). See how confusing it is in ksh where ${@:0:1} does not give you the first positional parameter for instance.
  • A shell is a user tool before being a programming language. It makes sense for most users to have the first element in $a[1]. It also means that the number of elements is the same as the last indice (in zsh like in most other shells except ksh, arrays are not sparse).
  • a[1] for the first element is consistent with a[-1] for the last.

So IMO the question should rather be: Why did David Korn's choose to make its arrays start at 0?

About your:

"However, as the previously released and widely used shell scripting languages ksh and bash both use 0"

Note that while bash was released a few months before zsh indeed (June 1989 compared to December 1990), array support was added in their respective 2.0 version, but for zsh that was released in 1991, while for bash it was released much later 1996.

The first Unix shell to introduce arrays (unless you want to consider the 1970's Thompson shell with its $1..$2 positional parameters) was csh in the late 70s whose indexes start at one. And its code was freely available, while ksh was proprietary and often not included by default on Unices (sold separately at a hefty price) until the late 80s. While ksh93 code was released as open source circa 2000, ksh88's, to this day, never was (though it's not too difficult to find ksh86a and ksh88d source code on archive.org today if you're interested in archaeology).

  • 14
    It's a bug in human languages that numbering is one-based, and remarkable serendipity that most programming languages managed to keep that legacy out of their design. All the sadder that, after zero-based indexing was virtually established as the standard, so many “user-oriented” languages got it backwards, trying to be “simpler” by using the wrong, one-based indexing again. — That said: the best way to avoid indexing confusion is of course to avoid numerical indices entirely. – leftaroundabout Dec 31 '15 at 10:46
  • Reading here: "When dealing with a sequence of length N, the elements of which we wish to distinguish by subscript,... a) yields, when starting with subscript 1, the subscript range 1 ≤ i < N+1; starting with 0, however, gives the nicer range 0 ≤ i < N. " . Dont know if TROLL or STUPID, but you can use "1 ≤ i ≤ N" for subscripts starting with 1 on arrays. There is no need to put that +1 in the end of the array scan, and neither of the points of view prove that starting indexes with 1 or 0 are better. –  Dec 31 '15 at 11:01
  • 1
    You can either justify that 0 was added AFTER to the human knowledge, and it somehow messed up things that time. http://www.theguardian.com/notesandqueries/query/0,5753,-1358,00.html - Again, just a matter of point of view :) –  Dec 31 '15 at 11:04
  • 4
    Bourne shell array $@ starts at 0 (not 1) $0 is the name of program you are running. you should correct your third point – Alex Jones Jan 07 '16 at 09:54
  • 3
    @edwardtorvalds, No, $0 is not a positional parameters. It is not part of $@. "$@" is "$1" "$2" .... When it comes to functions, in many shells, you see that "$@" are the arguments to the function, while $0 stays the script path (or shell argv[0] when not running a script) – Stéphane Chazelas Feb 22 '16 at 15:42
  • @leftaroundabout: a bug in human languages? that's a pretty strong statement. Can you support it straightforwardly with plain facts and clear reasoning? – iconoclast Mar 06 '17 at 20:05
  • 4
    @iconoclast Dijkstra argues very well for zero-based in the article I already linked to. — To give a single argument myself: with zero-based, you can easily calculate the absolute position of an element in a higher-rank array from the indices of the individual array dimensions, with one-based you get an awkward off-by-one situation with every extra dimension. Now, one may well question the need to ever index multidimensional data this way in the first place, but from my experience it's sometimes not feasible to avoid. – leftaroundabout Mar 06 '17 at 22:36
  • @nwildner: it's neither Troll nor stupid (heck, it's Dijkstra!). Yes, you can use 1 ≤ iN, but that only works smooth as long as you're only concerned with a single range in which everything happens (and in that case there's really no need to use any indices at all, it could better be done with higher-level functor mapping combinators). The interesting situations where indexing is necessary are when you have multiple ranges to keep apart, and that gets painful much quicker with 1 ≤ iN, since you can't just concatenate ranges because neither of the boundaries is exclusive. – leftaroundabout Mar 06 '17 at 22:42
  • @leftaroundabout: thanks! I didn't see your link because the styling on this site didn't give enough contrast to make that single word stand out. I'll check out his argument, and thank you for your summary. – iconoclast Mar 07 '17 at 15:44
  • Note that the words for "first" and "second" in several languages such as English, Italian and Malay are not derived from the same roots as one "one" and "two", but "third" and "fourth" and upwards are. I speculate that the concept of position and counting were initially separate, but merged as we required names for more and more positions, at which point we didn't have 0 yet. – Artelius Sep 27 '20 at 03:35
9

I think the most plausible answer to this is the reverse array built-in from zsh

If you have an array with 4 elements, lets say myvar=(1 2 3 4) and you want to access the 4th element it will be print $myvar[4], right?

However, if you want to create a loop that will list the elements inside this array backwards, it's just a matter of using negative indexes:

print $myvar[-1]   # will print 4
print $myvar[-2]   # will print 3
print $myvar[-3]   # will print 2
print $myvar[-4]   # will print 1

This should explain since starting from zero, you will not reach one of those elements as there is no -0.

The second reason behind this is probably the C code related to variables on zsh is using int or double int to define array indexes, and since it uses Two's Complement to represent negative numbers there is no way to represent -0(Signed zero), like you can do on float point variables.

If you are really used to indexes starting at 0, i suggest you to use the KSH_ARRAYS option to fix this.

And taking the hook of @cuonglm comment, the csh features implemented on zsh are explained here. It seems not to be a historical reason but a way to provide a confortable work environment for those who are used with csh

  • One could argue that there is a "-0" in math and it's simply the same as "0" since zero is neither negative nor positive. – phk Dec 30 '15 at 18:12
  • 3
    Then, this should make you access the first and last array items at the same time, breaking all the logic of scanning an array ;) Could even create a blackhole. LOL –  Dec 30 '15 at 18:13
  • 3
    for 0-indexed arrays replace the - with ~. – mikeserv Dec 30 '15 at 21:16
  • On my bash 4.3.42-1 - gives the same output as ~ – deekin Dec 30 '15 at 21:28
  • @mfxx - until the array index isn't set, yes, it will. – mikeserv Dec 30 '15 at 21:29
  • @mikeserv I tried setting the ksharrays option in zsh. [-0] returns the first element, [-1] the last, [~0] the last. Cool! However, I didn't quite understand what you meant with "until the array index isn't set". You meant the zsh option? I thought you were referring to all the 0-indexed array shells so I tried with bash. – deekin Dec 30 '15 at 21:55
  • 1
    @mfxx - i was talking about bash. i think it handles negative indices for set elements as syntax sugar, but then doesn't do so otherwise. can't remember. from my perspective, people should just create a directory and use files for whatever they're stuffing into all that shell state. you can index those elements any which way you like. – mikeserv Dec 30 '15 at 22:00
  • 1
    ~ is binary inversion. ~0 is -1. (all bits flipped, relies on how negative numbers are usually represented bit-wise). On an unset array or an array with only one element of indice 0, a[0], a[-0], a[-1] and a[~0] will give you the same thing in a ksh-like array. – Stéphane Chazelas Dec 30 '15 at 22:02
  • @mikerserv, a[-1] is a zsh feature recently added to bash and ksh93 – Stéphane Chazelas Dec 30 '15 at 22:04
  • 1
    @StéphaneChazelas - yeah, i think i remember handling this with ~ when -index didn't work. I guess probably all i did was ~-index. yeah. that sounds right. yes! and of course that works for unset elements as well. so i guess the comment above should be - for 0-indexed arrays add ~. i guess it just handled the -1 part easier maybe. i dunno. its not jumping to the forefront of my memory at the moment... – mikeserv Dec 30 '15 at 23:12