File backed, key value store implemented with posix utilities

Question

I imagine this is a pretty common thing to do.

Which posix utility for reads, which for writes? What are the most common file formats to do this with?

Is inplace modification possible?

My first thought was to use json and jq, but I'm interested in embracing the unix philosophy

Edit:

I don't think there is a standard tool for that. Except for grep/awk/sed etc. But using this you will need to care about lot of other issues like locking, format, special characters, etc. https://unix.stackexchange.com/a/21950/551103

What if didn't care about "locking, format, special characters" at all? Can someone give a minimalist implementation with grep/awk/sed

Duplicate of https://unix.stackexchange.com/questions/21943/standard-key-value-datastore-for-unix I guess implementing it with filesystem is the way to go — Tom Huntington, Nov 09 '23 at 07:11
grep/awk/sed are line-based text utilities. Does that mean that your keys and values are meant to be sequences of 0 or more characters other than NUL and newline encoded in the locale's charset and whose length in bytes is < 1023? — Stéphane Chazelas, Nov 09 '23 at 08:33
FYIW, zsh associative arrays (whose keys and values can be any sequence of bytes) can be tied to gdbm DBs. — Stéphane Chazelas, Nov 09 '23 at 08:34
Um... Using jq would be "embracing the Unix philosophy" as much as using any other tool. You can make it easy for yourself by using an existing tool specialised in reading and writing the data you are interested in, in the format that allows your data to be expressed most conveniently. You have not mentioned if you deal with simple strings, key+value pairs, or data with a more complicated structure, e.g. arrays, many-to-many mappings, etc. Rewriting tools for the sake of rewriting them in a specific language strikes me as masochistic. — Kusalananda, Nov 09 '23 at 08:40
Note that you can invoke perl or the interpreter of any other proper interpreted programming language in shells, and most of those programming languages can take the code to interpret on the command line or from a file descriptor. perl is ubiquitous (a lot more common than systems with the full POSIX toolchest let alone with fully compliant tools) and has support for (de)serialising its data structures or tying them with on-disk DB files. — Stéphane Chazelas, Nov 09 '23 at 08:46
@Kusalananda I kind of meant portable, its annoying to have to install dependencies. — Tom Huntington, Nov 09 '23 at 08:49
Then why not just use shell code that contain variable assignments (and comments)? Most humans understand thinks like var='value...'. — Stéphane Chazelas, Nov 09 '23 at 08:50
@StéphaneChazelas It wouldn't preserve state between sessions — Tom Huntington, Nov 09 '23 at 08:51
You can always dump the values of the variables as shell assignments into the file. — Stéphane Chazelas, Nov 09 '23 at 08:55
Why not just create files in the filesystem? The filename is the key, the file content is the value. The are code examples in http://stackoverflow.com/questions/688849/associative-arrays-in-shell-scripts In the (at the moment (2023-11-10)) third answer (https://stackoverflow.com/a/691023). — rathier, Nov 09 '23 at 23:08
@rathier The appeal of a file of key=value lines is that it is more human readable. But I agree the filesystem is a good way to implement this. also see https://unix.stackexchange.com/a/21979/551103 — Tom Huntington, Nov 09 '23 at 23:54

Tom Huntington · Accepted Answer · 2023-11-13T06:51:30.483

0

For alphanumeric key-values:

kvfile="kvfile"
put() {
    if grep -q "^${1}=" "$kvfile"; then
        sed -i "s/^${1}=.*$/${1}=${2}/" "$kvfile"
    else
        echo "${1}=${2}" >> "$kvfile"
    fi
}
get() {
    grep "^${1}=" "$kvfile" | awk -F= '{printf "%s", $2}'
}

Martin Kealey actually knows how to script sed and awk and suggests:

put() {
    sed -i "/^$1=/{ h ; s/=.*/=$2/ ; } ; $ { p; g; /./d; s/^$/$1=$2/ ; }" "$kvfile";
}
get() { 
    awk -F= -vk="$1" '$1 == k{ print $2 }' "$kvfile";
}

edited Nov 13 '23 at 06:51

answered Nov 09 '23 at 10:11

Tom Huntington

103

1

sed -i is not POSIX, using variable data inside a sed script is an ACE vulnerability. awk -F ... $2 breaks if values contain =, leaving parameter expansions unquoted has a very special meaning in POSIX sh, A && B || C cannot be used in place of proper if then else fi, behaviour of echo is unspecified if arguments may contain backslashes, the first argument of printf is the format, shouldn't be variable external data... – Stéphane Chazelas Nov 09 '23 at 10:38
1

Since you are using $1 unquoted, this would split the 1st argument's value on whitespace (by default), and then apply filename globbing on the split-up bits. The first of the resulting words would be used as a regular expression by grep while any other part would be used as input filenames. On the other hand, with sed, the value would be used as a regular expression as-is. In the sed calls, you limit the number of valid substrings that can occur in $1 and $2 since you are using them as the replacement of a s command. They can therefore not contain strings like \1 or &. – Kusalananda Nov 09 '23 at 10:38
1

What's missing is proper quoting and handling of the shell arguments, and encoding of the user-supplied data (both keys and values). Suggestions for encoding scheme: JSON or base64. jq handles both of these. – Kusalananda Nov 09 '23 at 10:41
1

The quotes in the sed command are quite scrambled (and they don't nest). Also the lack of quotes around $kvfile will mean this breaks when kvfile contains some names. I suggest put() { sed -i "/^$1=/{ h ; s/=.*/=$2/ ; } ; $ { p; g; /./d; s/^$/$1=$2/ ; }" "$kvfile" ; } in place of the existing grep+sed+echo combo (or something else if you don't have sed -i), and get() { awk -F= -vk="$1" '$1 == k { print $2 }' "$kvfile" ; } in place of the grep+awk combo – Martin Kealey Nov 13 '23 at 04:42

File backed, key value store implemented with posix utilities

1 Answers1

Linked