25

I'm working on a Kaggle challenge where some variables are represented by rows instead of columns (Telstra Network Disruption). I am currently searching for the equivalent of gather(), separate() and spread(), which can be found in R tidyr tool.

Ethan
  • 1,657
  • 9
  • 25
  • 39
cpumar
  • 815
  • 1
  • 10
  • 14

4 Answers4

8

I'd start with the melt() function in pandas. I wrote an article about it:

https://www.ibm.com/developerworks/community/blogs/jfp/entry/Tidy_Data_In_Python?lang=en

Ethan
  • 1,657
  • 9
  • 25
  • 39
JFP
  • 81
  • 3
4

R's gather() essentially goes from wide to long. So,

  1. check pandas page for how to use pandas.wide_to_long(),
  2. check this blog for a discussion on getting an elegant gather-like function in Python.
ximiki
  • 943
  • 1
  • 7
  • 15
2

I tried to syntactically mimic the tidyr package in python in a package called tidypython. I made it compatible with the dplython package, which includes usage of the >> operator for chaining commands.

It hasn't been fully tested, but should work pretty well:

https://github.com/durrantmm/tidypython

Let me know if it works for you.

0

There is a port of tidyr in python:

https://github.com/pwwang/datar

Disclaimer: I am the author.