Deep dive: How Lightning Engine delivers 4.9x faster Apache Spark performance

1 day ago 7

From foundational ETL and analytics to the frontier of generative AI, Apache Spark serves as the architectural backbone for global data processing. However, as data volumes scale, the trade-off between performance and infrastructure costs can be a limiting factor for growth. In the agentic era, where autonomous agents can trigger thousands of concurrent, multi-hop queries, this performance bottleneck directly dictates your unit economics.

We are excited to announce the general availability of Lightning Engine for Managed Service for Apache Spark, available across both our serverless and managed clusters deployment modes. Designed to address these scaling challenges directly, it is fully compatible with modern Spark workloads and requires zero changes to your existing data pipelines.

Whether you choose the zero-ops simplicity of our serverless deployment mode or the fine-grained infrastructure control of our managed clusters deployment mode, Lightning Engine serves as the unified performance engine to supercharge your job execution. By validating Lightning Engine across more than one million real-world workloads, we have fine-tuned it for industrial-grade stability as well as reliable performance gains.

With this general availability release, Lightning Engine delivers:

Up to 4.9x faster performance than standard open-source Spark
2x the price-performance over the leading high-speed Spark alternative

Let’s take a closer look at how Manager Service for Apache Spark achieves these great results.

Under the hood: Vectorized native execution

Traditional Spark execution is often bottlenecked by JVM execution overhead and garbage collection pauses. Lightning Engine bypasses these limitations by compiling Spark physical query plans into native C++ instructions optimized for Single Instruction, Multiple Data (SIMD) vectorization.

Built on the open-source Gluten and Velox runtimes with specialized Google-engineered enhancements, this native execution layer accelerates your most demanding data processing tasks with:

Vectorized sort: Accelerates sorting operations by processing data columnarly in native memory, significantly reducing CPU cycle overhead.
Accelerated window functions: Speeds up calculations performed across sets of rows (such as moving averages, aggregations, and deduplication) by executing them directly within the native C++ layer.
Smart fallback: If a query contains an operator or custom Java UDF that is not natively supported, the engine's intelligent push-down layer automatically and gracefully transitions that specific sub-tree back to the JVM, avoiding unnecessary data format conversions and preserving overall execution stability.

Optimized Cloud Storage and BigQuery connectors

High-performance compute is useless if the engine is starved for data. With Lightning Engine, we’ve optimized our storage connectors to ensure that reading data from Cloud Storage and BigQuery isn’t the bottleneck. Optimizations include:

Direct path connection: Bypasses multiple node hops and uses bi-directional streaming with Cloud Storage. This allows seek operations and vectorized readV APIs to run without reopening streams, accelerating scan times for complex, deeply nested Parquet or ORC files.
Metadata call reduction: Managing large-scale partitioned tables often comes with a hidden performance tax: the time spent simply listing files. Lightning Engine utilizes lexicographic listing in the driver to collect metadata and transmit it directly to executors, eliminating redundant Cloud Storage API calls and dramatically reducing Cloud Storage metadata costs.
Native BigQuery connector: Directly consumes BigQuery data in Arrow format. By avoiding the expensive conversion from Arrow to JVM UnsafeRow, the engine eliminates serialization overhead to accelerate scan times.

Broadcast joins and advanced query optimization

Lightning Engine incorporates an advanced, cost-based query optimizer inspired by Google's F1 and Spanner query engines, and introduces several custom optimization rules. Examples include:

Single HashTable caching: In standard broadcast joins, Spark builds join hash tables repeatedly across tasks. Lightning Engine builds the hash table once per executor and caches it, eliminating redundant CPU cycles and reducing the executor's memory footprint.
Aggregation pushdown: Automatically pushes partial aggregations below join shuffles. This minimizes the volume of data that must be transferred across the network, drastically reducing expensive shuffle stages.
Auto shuffle partitioning: Dynamically and adaptively determines the optimal number of shuffle partitions for each individual query stage based on runtime statistics, preventing out-of-memory (OOM) spills without over-partitioning.

https://storage.googleapis.com/gweb-cloudblog-publish/images/2_ghHCex2.max-1300x1300.png

Learn more technical details and hear Lowe’s experience with Lightning Engine from Google Cloud Next ‘26

Getting started

These updates are live and ready to use today! You can enable Lightning Engine directly through the Google Cloud console or via the gcloud CLI.

To submit a serverless batch job with Lightning Engine enabled, specify the premium tier in your Spark properties:

To spin up a new managed cluster with Lightning Engine and Native Query Execution (NQE) enabled, run the following command in your terminal:

Alternatively, navigate to the Managed Service for Apache Spark page in the Google Cloud console, click Create Cluster, select Cluster on Compute Engine, and choose Lightning Engine under the cluster configuration settings to automatically activate query acceleration for your workloads.

Posted in

Read Entire Article