I have a dataframe, which is sorted in descending order by the abc column (and needs to remain that way). So, beginning with the first row (i.e., where ID is 1259 and value is 4054), I want to keep the next row that has a lower value in the value column, which would be row 2 (i.e., value is 3902). Then, based on that value in row 2 (3902), I would then want to select the next lowest value row (i.e., 3801 in row 4). Then, based on that value in row 4 (3801), I would select the next lowest value row (i.e., rows 5-7 with a value of 1314), and so on. I also want to retain rows that are selected that have duplicate values (e.g., 1314 value for rows 5-7).
Data
# A tibble: 42 × 3
ID abc value
<int> <dbl> <dbl>
1 1259 4530 4054.
2 1617 4415 3902.
3 2441 4250 5286.
4 539 4161 3801.
5 1471 4012 1314.
6 687 3934 1314.
7 2436 3900 1314.
8 1295 3591 3558.
9 1738 3493 1188.
10 1747 3366 3902.
# … with 32 more rows
df <- structure(list(ID = c(1259L, 1617L, 2441L, 539L, 1471L, 687L,
2436L, 1295L, 1738L, 1747L, 672L, 2187L, 2214L, 738L, 1784L,
726L, 1712L, 2961L, 137L, 1188L, 1824L, 2856L, 953L, 2579L, 2010L,
484L, 2769L, 1736L, 2994L, 1572L, 586L, 2462L, 2180L, 1404L,
1472L, 1007L, 3000L, 545L, 2098L, 2029L, 2557L, 1623L), abc = c(4530,
4415, 4250, 4161, 4012, 3934, 3900, 3591, 3493, 3366, 3248, 3172,
3172, 3004, 3004, 2939, 2853, 2782, 2541, 2439, 2371, 2116, 2005,
1999, 1838, 1749, 1611, 1573, 1416, 1301, 1177, 1074, 975, 805,
729, 680, 538, 494, 309, 297, 155, 0), value = c(4053.69672793727,
3902.10003271411, 5286.47420784735, 3801.25160651053, 1314.10007572174,
1314.10007572174, 1314.10007572174, 3557.56126600696, 1188.20000553131,
3902.10003271411, 1543.90001773834, 1543.90001773834, 1543.90001773834,
1228.10004138947, 1228.10004138947, 1618.80007553101, 1618.80007553101,
2787.70004177094, 2877.32907307533, 1289.59995269775, 2640.10003471375,
1618.80007553101, 3557.56126600696, 3902.10003271411, 1228.10004138947,
1329.40004730225, 1228.10004138947, 3557.56126600696, 1618.80007553101,
1206.79998779297, 2117.7806730578, 1206.79998779297, 1206.79998779297,
3260.02910244849, 1191.90001773834, 1191.90001773834, 1029.70001125336,
1336.000041008, 1091.09995365143, 1234.89997673035, 1164.4999704361,
1191.90001773834)), row.names = c(NA, -42L), class = c("tbl_df",
"tbl", "data.frame"))
Expected Output
ID abc value
<int> <dbl> <dbl>
1 1259 4530 4054.
2 1617 4415 3902.
3 539 4161 3801.
4 1471 4012 1314.
5 687 3934 1314.
6 2436 3900 1314.
7 1738 3493 1188.
8 3000 538 1030.
I know I could do something like this answer to find the next lowest value for each row, but I am unsure how to go through and filter. Since this can skip several rows, using lag and lead did not work for me. I would like to be able to do this with tidyverse or data.table or the apply family. I would like to avoid using a for loop, unless that's the only way. I have a larger dataset, so I am looking for an efficient way to do this.