vectorize is not a substitute for iteration. It's a way of giving you the full power of numpy broadcasting. The resulting function takes one or more arrays, broadcasts them together, and then feeds a simple tuple of values (i.e. scalars, one from each array) to the wrapped function.
In your code f is a list of lines - all the lines from the file.
You could do something like:
N = len(f)
for i in range(0,N,1000):
a = np.loadtxt(f[i:i+1000], delimiter=',')
<process array a>
In other words, feed the lines of f to loadtxt in blocks.
Actually you don't need to read all of the lines at once. You could write a generator that reads the file line by line, and returns blocks of lines.
The use of generators to feed loadtxt (or genfromtxt) has been discussed before.
Working example of vectorize
In [121]: def processing(astr):
return list(map(int, astr.split(',')))[0] # py3
.....:
In [122]: processing(a[0])
Out[122]: 1
In [123]: fn=np.vectorize(processing, otypes=[int])
In [124]: fn(a)
Out[124]: array([1, 4, 7])
This function takes a string and returns one int. It's no better than
In [125]: [processing(l) for l in a]
Out[125]: [1, 4, 7]
I removed delimiter and dtype from the arguments because we don't want to iterate over those parameters. There is an exclude parameter to vectorize; but I didn't want to play with that.
vectorize also takes multiple values for the otypes, but I haven't seen an example of that use. Your function didn't work because it returned a sequence (e.g. 3 ints), but vectorize expected it to return one value (a scalar float or int).
If you specify otypes as object, your processing does work - sort of
In [126]: def processing(astr):
return list(map(int, astr.split(','))) # py3
.....:
In [127]: fn=np.vectorize(processing, otypes=[object])
In [128]: fn(a)
Out[128]: array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=object)
But why not just iterate?
In [129]: [processing(l) for l in a]
Out[129]: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
In [130]: np.array([processing(l) for l in a])
Out[130]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
I vaguely recall some SO questions about errors when using vectorize with object returns.
While I'm on a roll, I might as well illustrate broadcasting:
Here's your function that takes 2 values - a string and a function, and applies the function to each element of the split:
In [131]: def processing(astr, conv):
return list(map(conv, astr.split(','))) # py3
.....:
In [132]: fn=np.vectorize(processing, otypes=[object])
Now the vectorized function takes 2 inputs, e.g. a list and function:
In [133]: fn(a,int)
Out[133]: array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=object)
or a different function for each string in a (2 lists)
In [134]: fn(a,[int,float,str])
Out[134]: array([[1, 2, 3], [4.0, 5.0, 6.0], ['7', '8', '9']], dtype=object)
Or make the 2nd list a 'column' list - and get back a (2,3) array of lists. One row is ints, the other floats. Obviously I could replace the lists with arrays (0, 1d, 2d etc).
In [136]: fn(a,[[int],[float]])
Out[136]:
array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]],
[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]]], dtype=object)
If you need this kind of flexibility in inputs then use vectorize. But if you are just iterating over one array or list - do it directly.
I found an example of multiple otypes: https://stackoverflow.com/a/30255971/901925
Applied to this case:
In [140]: def processing(astr):
return tuple(map(int, astr.split(','))) # py3
.....:
It's important that it returns a tuple, not a list or array.
In [141]: processing(a[0])
Out[141]: (1, 2, 3)
In [142]: fn=np.vectorize(processing, otypes=[int,int,int])
Note that there has to be an otype for each item of the returned tuple.
In [144]: fn(a)
Out[144]: (array([1, 4, 7]), array([2, 5, 8]), array([3, 6, 9]))
But [1, 4, 7] is the first value of each of the 3 inputs. It's returning a tuple of arrays, not one array.
In [146]: x,y,z=fn(a)
In [147]: x
Out[147]: array([1, 4, 7])
This behavior bothered the other questioner, and I doubt if it's what you want either. :)
https://stackoverflow.com/a/30088791/901925 - a vectorizing example with time tests.