Introducing our first agent: Lychee
A Spark Incident Response Agent that helps you debug and optimize your data pipelines
#data-issues
Automated Reporting
Lychee analyzes issues and generates reports automatically
x_join.py
# Before from pyspark.sql import SparkSession spark = SparkSession.builder .appName("InefficientJoin") .getOrCreate() # Inefficient join operation orders = spark.read.parquet("s3://bucket/orders") customers = spark.read.parquet("s3://bucket/customers") # Missing broadcast hint for small table result = orders.join( customers, // [!code --] orders.customer_id == customers.id, "inner" )
x_join.py
# After from pyspark.sql import SparkSession from pyspark.sql.functions import broadcast // [!code ++] spark = SparkSession.builder .appName("OptimizedJoin") .getOrCreate() # Optimized join with broadcast hint orders = spark.read.parquet("s3://bucket/orders") customers = spark.read.parquet("s3://bucket/customers") # Using broadcast join for small table result = orders.join( broadcast(customers), // [!code ++] orders.customer_id == customers.id, "inner" ) # Query runs 5x faster with broadcast join
Automatic Issue Resolution
Lychee automatically identifies and implements fixes for Spark pipeline issues via PRs
100
Time Delay
Performance Optimization
Lychee automatically detects and optimizes inefficiencies in your Spark pipelines
Managed Lychee
Enterprise-grade Spark Incident Response with zero infrastructure management