df = pd.DataFrame( 'a': range(1000000), 'b': range(1000000, 2000000) ) df['c'] = df.apply(lambda row: row['a'] * row['b'], axis=1) Pandarallel (fast) df['c'] = df.parallel_apply(lambda row: row['a'] * row['b'], axis=1) 2. Parallel map on Series def slow_function(x): return x ** 2 + x * 3 series = pd.Series(range(100000)) Parallel version result = series.parallel_map(slow_function) 3. Parallel applymap for element-wise operations df = pd.DataFrame(np.random.rand(1000, 1000)) def complex_func(x): return np.log(x + 1) * np.sin(x) Apply to every element in parallel result = df.parallel_applymap(complex_func) 4. Parallel groupby-apply df = pd.DataFrame( 'group': np.random.choice(['A', 'B', 'C'], 100000), 'value': np.random.randn(100000) ) def group_operation(group): return group['value'].mean() + group['value'].std()
def heavy_func(x): return sum(np.sin(x) * np.cos(x) for _ in range(100)) start = time.time() result_pd = df['x'].apply(heavy_func) print(f"Pandas: time.time() - start:.2fs") Pandarallel start = time.time() result_pll = df['x'].parallel_apply(heavy_func) print(f"Pandarallel: time.time() - start:.2fs") Common Issues & Solutions 1. PicklingError (lambdas with closures) # This will fail df.parallel_apply(lambda row: row['a'] + external_var) Solution: Define a regular function def add_external(row): return row['a'] + external_var pandarallel
What is Pandarallel? Pandarallel is a Python library that provides easy parallel computing for pandas operations. It allows you to replace standard pandas apply , map , and other functions with parallelized versions, leveraging all CPU cores of your machine. Installation pip install pandarallel For full features (progress bars, etc.): df = pd