pandas



My Linked Notes

  • 2020-10-27
    • For both ray and dask, you can pip install their libraries and get started running them locally. They can also be used to parallelize [[pandas]], which is the number 1 tool for most data scientists
      • This is really important, because data scientists don't want to have to learn spark, but to run big data processing jobs, technologies like spark were the only way
      • dask and ray provide a "native" python way to run parallel computations. And of course, their parent businesses are going to let users pay to have a reliable and easy to use environment to run their software

One last thing

If you liked these notes, hit me on Twitter!