Duplicate elements of an array except the first and last elements

Question

I have an array, and I would like to repeat each element except the first and last elements.

For example if the array has five elements 1 2 3 4 5, then after repeating, its elements should be 1 2 2 3 3 4 4 5.

My bash commands are

$ newarr=("${myarr[0]}")
$ for i in $(seq 1 $((${#myarr[@]}-2))) ; do newarr+=( "${myarr[i]}" "${myarr[i]}"); done
$ newarr+=("${myarr[-1]}")

Is there a more clear way than mine?

I am also wondering about how to wrap that into a function which takes myarr as argument, and returns newarr. (After creating such a function, I will read an array from a file, so that each element stores a line in the file, and then call the function on the array. If creating such a function is not a good approach, let me know.)

Thanks.

Kusalananda · Accepted Answer · 2018-11-17T19:31:32.123

2

You can get the number of operation down ever so slightly, and skip the call to seq:

for (( i = 1; i < ${#myarr[@]} - 1; ++i )); do
    newarr+=( "${myarr[i]}" "${myarr[i]}" )
done
newarr=( "${myarr[0]}" "${newarr[@]}" "${myarr[-1]}" )

This assumes that newarr is empty to start with. Do unset newarry first if it's not.

As a function (this modifies the array that is passed):

dup_interal_items () {
    typeset -n arr=$1
    local tmparr

    for (( i = 1; i < ${#arr[@]} - 1; ++i )); do
        tmparr+=( "${arr[i]}" "${arr[i]}" )
    done
    arr=( "${arr[0]}" "${tmparr[@]}" "${arr[-1]}" )
}

The name of the array is passed into the function and the name-reference variable arr is used to access the elements in the array. At the end, the original array is updated to contain the result.

I did it this way rather than returning an array as you can only return an exit status from a function. The other approach would have been to pass both the names of an input array and an output array and use two name-reference variables in the function.

Or, you could possibly echo or printf the values in the function (in which case you don't have to construct a new array at all) and then parse that data in the main code. The function would then be required to be called inside a command substitution.

Note that you can't call this function with an array called arr due to the particular name scoping rules used by bash. You may want to rename the arr variable in the function if this is an issue.

Testing:

$ myarr=( 1 2 3 "here we go" )
$ dup_interal_items myarr
$ printf 'Element: %s\n' "${myarr[@]}"
Element: 1
Element: 2
Element: 2
Element: 3
Element: 3
Element: here we go

To duplicate all lines of a file except for the first and last line:

sed -e '1b' -e '$b' -e 'p' <file

The sed script branches to the end (where there is an implicit print statement) if it's on the first or last line, but prints all other lines (all other lines are therefore both explicitly printed by that last p and implicitly printed).

Or, as contributed by don_crissti,

sed 'p;1d;$d' <file

which explicitly prints each line, then ends the cycle for the first and last line, but prints all other lines implicitly (a second time).

An equivalent awk program that does not store more than a single line in memory would be non-trivial to write.

edited Nov 17 '18 at 19:31

answered Nov 17 '18 at 18:27

Kusalananda

333,661

Thanks. (1) "you can't call this function with an array called arr due to the particular name scoping rules used by bash." Do you mean arr declared as typeset -n arr=$1 inside the function body has global (script-wide) scope? (2) Among the three approaches when creating a bash function, which one do you suggest more or none? – Tim Nov 17 '18 at 21:08
@Tim (1) A name reference variable can't reference another variable that has the same name as itself (see also Circular name references in bash shell function, but not in ksh). (2) It depends on what you'd like to do and how you want/need to get there. If the original data was coming from a file, and wasn't used for anything else, I would probably go with reading the array from sed instead of reading it and manipulating it. If the resulting data would go to a file, I would not read it into an array at all. – Kusalananda Nov 17 '18 at 21:13
Thanks. "If the resulting data would go to a file, I would not read it into an array at all." Could you be more specific? – Tim Nov 17 '18 at 22:39
@Tim Well, if all you want to do is to duplicate the internal lines of a text file and write that to a new file, then there is that sed command that could do that for you. Why read it into an array in bash? – Kusalananda Nov 17 '18 at 22:41
I am thinking of writing a shell script, which takes several input files. Each file contains lines of strings in their order. The script will invoke tsort to figure out the total order among the strings from all the input files, and write these strings to stdout in the total order. So when the script reads one input file, it has to duplicate the strings except the first and last one. Then the script will merge the pairs of strings from all the input files, before invoking tsort on them – Tim Nov 17 '18 at 22:50
I must be misconstruing bash 4.4.23's man page as I read: "The nameref attribute cannot be applied to array variables. "_ and that is precisely what you seem to be doing, since what you seem to pass to the function (as $1) is the array, not just its name by ref... and then apply typeset -n arr=$1. I am seriously confused.... o_O ... ¿?@! – Cbhihe Nov 18 '18 at 17:45
1

@Cbhihe No, you are misreading. It says "Array variables cannot be given the nameref attribute. However, nameref variables can reference array variables and subscripted array variables." In my code, arr (in the function) is not an array variable, it's an ordinary variable that I give the nameref attribute to. It then references an array variable. – Kusalananda Nov 18 '18 at 17:55

score 1 · Answer 2 · answered Nov 17 '18 at 22:12

1

Try also - no function needed -

$ ARR=(1 2 3 4 5)
$ IFS=$'\n\t '
$ readarray NEWARR < <(echo "${ARR[*]}" | sed '1!p; $d')
$ echo ${NEWARR[@]}
1 2 2 3 3 4 4 5

or even

NEWARR=($(printf "%s\n" ${ARR[@]} | sed '1!p; $d'))

answered Nov 17 '18 at 22:12

RudiC

8,969

Please avoid using variable names in caps (except for environment). – Nov 18 '18 at 03:06
You should quote variable expansions (${NEWARR[@]}) and use -t on readarray to remove the trailing newline of each array element. – Nov 18 '18 at 03:09

score 1 · Answer 3 · answered Nov 18 '18 at 03:21

A solution that accepts newlines:

#!/bin/bash
arr=( 1 2 "3 3" $'41\n42' 5 )

readarray -t -d $'\0' newarray < <(printf '%s\0' "${arr[@]}" | sed -z '1!p; $d')

printf '<%s>\n' "${newarr[@]}"

Runs as:

$ ./script
<1>
<2>
<2>
<3 3>
<3 3>
<41
42>
<41
42>
<5>

Duplicate elements of an array except the first and last elements

3 Answers3