I used pySpark some time ago when it was introduced to my company at the time and I realized that it was slow when you used python libraries in the UDFs rather than pySpark's own functions.
Yes using Python UDFs within Spark pipelines are a hog! That’s because the entire Python context is serialized with cloudpickle and sent over the wire to the executor nodes! (It can represent a few GB of serialized data depending on the UDF and driver process Python context)