I prefer Apache Arrow to CSV libraries that I used to use in Rust since defining fixed struct is not suitable to my tasks. I only want to process a few columns with particular names from the header. I don't want to fix my code just because people add more columns or change an order of columns.

#rustlang #csv #rust #arrow

@veer66 you don’t have to. Which csv crate are you using, and did you use serde’s macros to map the header fields to whatever you wanted them to be? Extra stuff doesn’t matter.

@ajmartinez The process is reading a csv file, adding a column by calculating from another column, and writing everything to a new csv file. By using a struct, reading with some extra stuffs should be okay. I don't know what to do in writing everything because if the extra stuffs aren't in the struct, they won't be deserialized and they won't be written.

@veer66 you can use Option<T> to make your destination field optional on read, and then add the value as Some<T> before you write it. Make sure you read the csv into a mutable struct and you should be good to go.

@ajmartinez The known added column is okay but there are so many columns will added to the input file in a few days, which their names are unknown for now.

@veer66 have you tried the community help chat on Discord? Lots of helpful and very knowledgeable folks there who can probably help.

@ajmartinez No, I haven't since Apache Arrow satisfies me.
Follow

@veer66 good deal. I generally turn to Python when my Rust chops aren’t enough. If Apache Arrow handles your use case rock on with it!

@ajmartinez I switched Python two week ago and I found that my program turned to be very slow. 😪 So I switched back to Rust. Now I feel that using Arrow in Rust is as convenient as using PyArrow. 😍

@veer66 I know the feeling. One of my Advent of Code solutions that disgusts a real software engineer for its inefficient algorithm still *finishes execution* before Python can even launch its interpreter. Still, Python remains a useful tool on which I have created quite a few mission critical tools.

@ajmartinez In many cases, Python finishes execution before a Rust compiler finishes building an executable file. A few days ago, I ran PyTorch for training a LSTM. Python version took 35 minutes and Rust version took 34 minutes, which the difference doesn't really matter.

Last week I processed my dataset and I switched to Rust when Python version cloud't finish the task in six hours, while Rust version took 22 seconds. Later Sorrawut Kittikeereechaikun showed me that I can avoid converting between Arrow/Numpy numbers to Python numbers too often. Then Python version can take only 4 times slower than Rust version.

I mean Python is very useful and not too slow for most of my use-cases. Rust helps me getting a long task done without being clever.

@veer66 That's true, but unless you're running a daemon that interpreter startup time is per execution and the build cost (for a given version) happens exactly once. Of course I think this is amplified on simpler problems where the comparison is fairly simple std lib vs std lib work. Otherwise like most things it's about the right tool for the job, and the compile time certainly factors in!

Sign in to participate in the conversation
Librem Social

Librem Social is an opt-in public network. Messages are shared under Creative Commons BY-SA 4.0 license terms. Policy.

Stay safe. Please abide by our code of conduct.

(Source code)

image/svg+xml Librem Chat image/svg+xml