How to create a new column based on the values from multiple columns which are matching a particular string?

Question

I have data frame which looks like this:

df=data.frame(
  eye_problemsdisorders_f6148_0_1=c("A","C","D",NA,"D","A","C",NA,"B","A"),
  eye_problemsdisorders_f6148_0_2=c("B","C",NA,"A","C","B",NA,NA,"A","D"),
  eye_problemsdisorders_f6148_0_3=c("C","A","D","D","B","A",NA,NA,"A","B"),
  eye_problemsdisorders_f6148_0_4=c("D","D",NA,"B","A","C",NA,"C","A","B"),
 eye_problemsdisorders_f6148_0_5=c("C","C",NA,"D","B","C",NA,"D","D","B")

In reality I have much more columns and they don't always match "eye_problemsdisorders_f6148" this string, and there is much more rows.

What I would like to do is create a new column, say named "case" where I would have value "1" for every row where string "A" appears at least once in any column, if not the value would be "0". So in the above example column "case" would have these values: 1,1,0,1,1,1,0,0,1,1

You might prevent misdirected answers by also using the body of the question (in addition to the tags) to limit your desired answer ("in R", as in your response to the existing answer). Tags reflect the software in use, and aren't always exclusive to an answer. Thanks! — Jeff Schaller, Jul 29 '19 at 19:06

steeldriver · Answer 1 · 2019-07-29T22:13:04.003

Given

> df=data.frame(
+   eye_problemsdisorders_f6148_0_1=c("A","C","D",NA,"D","A","C",NA,"B","A"),
+   eye_problemsdisorders_f6148_0_2=c("B","C",NA,"A","C","B",NA,NA,"A","D"),
+   eye_problemsdisorders_f6148_0_3=c("C","A","D","D","B","A",NA,NA,"A","B"),
+   eye_problemsdisorders_f6148_0_4=c("D","D",NA,"B","A","C",NA,"C","A","B"),
+   eye_problemsdisorders_f6148_0_5=c("C","C",NA,"D","B","C",NA,"D","D","B")
+ )

then

> f = function(x) any(x == "A", na.rm = TRUE)
> 
> apply(df, MARGIN = 1, FUN = f)
 [1]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE
>

Coercing the logical TRUE, FALSE values to numeric 1, 0 and adding as a new column:

> df$case <- as.numeric(apply(df, MARGIN = 1, FUN = f))
> 
> 
> df
   eye_problemsdisorders_f6148_0_1 eye_problemsdisorders_f6148_0_2
1                                A                               B
2                                C                               C
3                                D                            <NA>
4                             <NA>                               A
5                                D                               C
6                                A                               B
7                                C                            <NA>
8                             <NA>                            <NA>
9                                B                               A
10                               A                               D
   eye_problemsdisorders_f6148_0_3 eye_problemsdisorders_f6148_0_4
1                                C                               D
2                                A                               D
3                                D                            <NA>
4                                D                               B
5                                B                               A
6                                A                               C
7                             <NA>                            <NA>
8                             <NA>                               C
9                                A                               A
10                               B                               B
   eye_problemsdisorders_f6148_0_5 case
1                                C    1
2                                C    1
3                             <NA>    0
4                                D    1
5                                B    1
6                                C    1
7                             <NA>    0
8                                D    0
9                                D    1
10                               B    1

score 0 · Answer 2 · answered Jul 29 '19 at 17:50

0

I am going to get down voted again for short answers, but here is one:

awk '{if ($0 ~ /A/) {printf 1} else {printf 0}}' datafile

You need printf here as awk will print a newline character. If you want/need the commas, you can add them.

answered Jul 29 '19 at 17:50

number9

1,064

Hi thanks for getting back to me but I would need this in R – anikaM Jul 29 '19 at 17:51
1

Hrm, I re-read your description and I do not see that anywhere. You may have more luck if you post that to stackexchange, actually. https://stats.stackexchange.com/questions/tagged/r This may also probe useful: https://www.regular-expressions.info/rlanguage.html – number9 Jul 29 '19 at 18:03

How to create a new column based on the values from multiple columns which are matching a particular string?

2 Answers2