Questions tagged [groupby]

A groupby operation: splits the data, applies a function to each group, and combines the results.

The Pandas library offers a handy explanation.

31 questions
5
votes
1 answer

How to use df.groupby() to select and sum specific columns w/o pandas trimming total number of columns

I got Column1, Column2, Column3, Column4, Column5, Column6 I'd like to group Column1 and get the row sum of Column3,4 and 5 When I apply groupby() and get this that is correct but it's leaving out Column6: df =…
Steven
  • 129
  • 2
  • 5
  • 16
3
votes
0 answers

User defined aggregations on data of around 200GB where row order matters

I am working with "medium large" data of around 200GB. The data are long form log files, where there are several thousand logs for each "entity". The entities are actually flights and each log entry occurs at a different time stamp. Temporal order…
Placidia
  • 226
  • 1
  • 5
2
votes
1 answer

Normalize data from different groups

I have data that has been grouped into 27 groups by different criteria. The reason for these groupings is to show that each group has different behavior. However, I would like to normalize everything to the same scale. For example, I would like to…
formicaman
  • 141
  • 2
1
vote
1 answer

pandas groupby.count doesn't count zero occurrences

I am using groupby.count for 2 columns to get value occurrences under a class constraint. However, if value $x$ in feature never occurs with class $y$, then this pandas method returns only non-zero frequencies. Is there any solution or alternate…
Şafak
  • 23
  • 6
1
vote
0 answers

MongoDB Groupby Rank

Im Working With Mongodb And Wanted to do a query using Aggregate fucntion. Query Is Each city has several zip codes. Find the city in each state with the most number of zip codes and rank those cities along with the states using the city…
Noob
  • 11
  • 1
1
vote
1 answer

Using user defined function in groupby

I am trying to use the groupby functionality in order to do the following given this example dataframe: dates = ['2020-03-01','2020-03-01','2020-03-01','2020-03-01','2020-03-01', …
1
vote
0 answers

Why is the behaviour of groupby command different in R and in python with pandas?

if I execute the following lines of codes in R as well as python for groupby command, the order of the resulting dataframes are not matching when I convert them to dataframes after that. How to maintain the order to be same when I execute the R code…
Malathi
  • 135
  • 1
  • 1
  • 6
1
vote
1 answer

pandas groupby and sort values

I am studying for an exam and encountered this problem from past worksheets: This is the data frame called 'contest' with granularity as each submission of question from each contestant in the math contest. The question is and the answer is in…
JChang
  • 13
  • 2
1
vote
1 answer

Pandas - Sum of multiple specific columns

I created this script: import pandas as pd pd.set_option('display.min_rows', None) pd.set_option('display.max_columns', None) df = pd.read_excel('file.xlsx', sep=';', skiprows=6) df = df.drop(['Position',…
Steven
  • 129
  • 2
  • 5
  • 16
1
vote
0 answers

Custom DataFrame format for exporting to excel sheets

I have the following DataFrame and I want it exported in an excel file with different sheets having different words as shown in the image 0 2020-06-14 coronavirus es -0.021277 coronavirus is -0.006024 1 2020-06-13 coronavirus es -0.021277…
m2rik
  • 321
  • 2
  • 11
1
vote
1 answer

How to group by one column and count frequency from other column for each item in the previous column in python?

I am trying to group my data by the 'ID' column. Then I want to count the frequency of 'sequence' for each 'ID'. Here is a sample of the data frame: ID Sequence 101 1-2 101 3-1 101 1-2 102 4-6 102 7-8 102 4-6 102 4-6 103 …
Farah
  • 21
  • 3
1
vote
1 answer

Group_by 2 variables and pivot_wider distribution based on 2 others

Performing some calculations on a dataframe and stuck trying to calculate a few percentages. Trying to append 3 additional columns added for %POS/NEG/NEU. E.g., the sum of amount col for all observations w/ POS Direction in both Drew & A/total sum…
DataGuy23
  • 31
  • 1
  • 4
1
vote
0 answers

Pushing down Group By clause

I've studied this exercise in class But I cannot figure out why, when I push down the Group By clause, I can remove RLCode attribute from GroupBy. Does this action change the meaning of query tree?
Lorenzoi
  • 25
  • 4
1
vote
0 answers

Finding unique features across all groups in a dataframe

I have a a networking dataset and it seems the data is coming from different channels. So the same timestamp is repeated for different rows which makes it meaningless to do timeseries analysis. My goal is to see of there is a way to extract only one…
Hanna
  • 111
  • 2
1
2 3