Monitoring a quantum network on AWS

4 months ago 37
News Banner

Looking for an Interim or Fractional CTO to support your business?

Read more
A video of the authors discussing the solution.

A video of the authors discussing the solution.

This post was contributed by Alba Vendrell (LuxQuanta) and Juan Moreno (AWS).

Quantum Key Distribution (QKD) uses the quantum properties of photons (particles of light) to generate and distribute highly secure encryption keys. While its unique security advantages have attracted significant interest, large-scale deployments face challenges related to managing performance monitoring and infrastructure complexity. AWS provides scalability and flexible resource allocation, simplifying the integration of QKD networks with existing classical IT infrastructure. This integration can be a valuable step toward broader adoption of QKD technology.

In this post, we outline the steps we took at LuxQuanta to streamline the management of QKD devices using AWS infrastructure automation tools. We will demonstrate how the right toolset enables centralized management of key metadata and network activity, offering insights into network performance and health. We provide detailed coverage of the infrastructure implementation steps that are purely classical. If you are more interested in QKD key performance metrics, you can skip to the section “Extracting Metadata from the NOVA LQ System.

Background

As a European QKD manufacturer, LuxQuanta’s primary mission is to develop and commercialize advanced Quantum Cryptography technologies that are easy to install in conventional communication infrastructure while meeting the highest security standards.

Our new system, NOVA LQ, is based on a variation of QKD known as continuous-variable QKD, or CV-QKD. Continuous-variable technology, in contrast to discrete-variable (DV-QKD), offers a series of features that benefit the practical mass deployment of QKD technology within current classical networks. The quantum channel, where quantum communication between QKD pairs occurs, can coexist in the same optical fiber and optical band alongside classical communications, removing the need for specific dark fiber deployments and reducing costs.

The practical implementation of CV-QKD technology in real-world settings requires a classical layer for infrastructure management, system monitoring, and scalability. Using the AWS Cloud Development Kit (CDK), we can define our cloud infrastructure in code, significantly simplifying the provisioning processes. This approach allows the consistent application of security measures and integrates seamlessly into our development workflows, enhancing the system’s scalability—by automatically adjusting resources based on demand—and facilitating replication by customers interested in deploying quantum cryptography.

Furthermore, adopting AWS CDK aligns with modern development practices, offering benefits beyond scalability and ease of replication. It reduces the costs and complexities associated with physical infrastructure and manual maintenance. Consequently, LuxQuanta’s system emerges not only as a trailblazer in secure communication but also as a model of efficiency and innovation in the quantum cryptography domain.

Setting Up QKD device monitoring on AWS

At the operations team at LuxQuanta, we built a CDK script to deploy the base infrastructure for a comprehensive monitoring environment for our QKD devices. The application layer for monitoring, log management, and visualization is separated from user authentication and the underlying storage of device metadata. This separation enhances scalability and security while allowing flexible management of additional QKD devices. Isolating user authentication and device metadata storage facilitates the integration of NOVA LQ devices into our customers’ infrastructure, enabling secure communication and data exchange.

Figure 1 – The diagram depicts a generic quantum network (top) with NOVA LQ CV-QKD devices. It connects via a VPN with a 3-tier infrastructure in AWS that hosts the monitoring, alerting, and log management functions. A bastion host is added to give access to admin users to run maintenance and update tasks.

Figure 1 – The diagram depicts a generic quantum network (top) with NOVA LQ CV-QKD devices. It connects via a VPN with a 3-tier infrastructure in AWS that hosts the monitoring, alerting, and log management functions. A bastion host is added to give access to admin users to run maintenance and update tasks.

Prerequisites

Before diving into the specifics of our CDK script and its deployment, check that your development environment is correctly set up and that you’re using any Linux OS (Ubuntu was our preferred choice). For Windows users, a Linux environment can be established via WSL (Windows Subsystem for Linux) or by using a virtual machine. Additionally, set up your AWS CLI credentials following the AWS CLI documentation guidelines.

AWS CDK is required and can be installed by running npm install -g aws-cdk in your terminal. Your system must also have Node.js, which can be downloaded from its official website. You must install the TypeScript extension that CDK uses, and then bootstrap your environment in an empty folder, specifying your AWS account and AWS Region where you’ll build your automation. You can follow the AWS CDK documentation for detailed steps.

Values to change before running the code

After setting up the environment, you must adjust certain values in the CDK script provided here to tailor it to your AWS setup and security requirements. Before deploying, modify the following parameters:

  1. SSH Key Pair: Replace the keyName parameter with your actual SSH key name to facilitate secure access to your instances. You might need to create a key pair in the Amazon EC2 console if you don’t have one.
  2. IP Ranges/CIDR Blocks: Review and update any specified CIDR blocks or IP addresses in the security group rules to match your network configurations. For instance, the ingress rule for the bastion host should only reflect your office network or the specific IP range you intend to allow SSH access from.

By making these adjustments, you’ll align the deployment with your organizational security policies and networking setup, integrating the QKD monitoring system into your AWS environment.

Provisioning resources

To facilitate the deployment of a secure and scalable environment, the AWS CDK script is structured as follows.

VPC and subnets: This section focuses on setting up a Virtual Private Cloud (VPC) and defining both public and private subnets. The VPC is crucial for isolating and providing networking capabilities to the QKD device monitoring setup. The configuration includes a VPC with a specified maximum number of Availability Zones and NAT gateways, along with subnet configurations tailored for public and private access.

Bastion host: A bastion host is established within the public subnet to enable secure SSH access to instances within the private subnet. This setup is designed to limit access to system administrators, providing a secure entry point to the infrastructure and facilitating activity logging.

Application Load Balancer (ALB) and Auto Scaling Group (ASG): This segment introduces the setup of an ALB and an ASG to manage and scale the monitoring resources effectively. The ALB is configured to listen on the specific port that the application layer needs, and forward traffic to the ASG, ensuring the high availability and scalability of the solution.

Compute: This portion addresses the compute requirements necessary for the system’s scalability, focusing on the criteria for spinning up new instances based on network requests. It emphasizes the use of network requests per minute as a scaling parameter, rather than CPU or memory usage, to align better with the monitoring application’s needs. It uses the latest version of the Amazon Linux Amazon Machine Image (AMI), but you can point to any AMI ID that is available in the AWS Region you use. At LuxQuanta, we created our own AMI, so the service can scale dynamically without having to provision software and configuration when we spin up a new instance.

Database backend: An Amazon Relational Database Service (RDS) instance is provisioned within the same private subnet for securely storing quantum-related metadata, such as metrics and device information used for monitoring and visualization purposes. This setup permits the secure handling of sensitive information related to the QKD devices.

After reviewing the basic AWS infrastructure and configuring the QKD devices to send traffic to the appropriate ALB endpoint, we’ll cover the solutions that we used within the application layer to enhance monitoring, alarming, notifications, time series, and log management.

Monitoring and alarming

Monitoring and alarming tools are crucial for tracking the status and performance of QKD devices, providing real-time insights and notifications about any irregularities. Popular open-source options include Prometheus and Grafana, which offer a combination for metric collection, visualization, and alerting. Our devices use a Grafana-based telemetry system for data visualization in conjunction with Loki and Prometheus, with custom exporters, as system data sources.

To facilitate real-time observation and analysis of key metrics, with a simple understanding of the system’s operational efficiency, the system collects logs and metrics to be plotted in the Grafana panels. This provides real-time system health information for various components, which we cover later in the metadata section of this post. It also allows the monitoring of the state of the different QKD layers, based on an interface built on AWS’ auto-scaling and load balancer functionality.

Figure 2 – In this figure, we can see a summary of some of the panels of interest in QKD that allow monitoring the solution. The metrics shown are Excess Noise Variance, Vmod, Transmittance, Postprocessing values, and the Secret Key Rate (SKR).

Figure 2 – In this figure, we can see a summary of some of the panels of interest in QKD that allow monitoring the solution. The metrics shown are Excess Noise Variance, Vmod, Transmittance, Postprocessing values, and the Secret Key Rate (SKR).

We configured the dashboards to use variables to target the desired hosts, improving the user experience by horizontally scaling resources. Both the data source and dashboards are managed by the Grafana provisioning system, so the automatic scaling can be performed encompassing this service and others within the frontend that we detail in the coming sections.

The repository tree looks as follows:

grafana-repo/ |-- dashboards/ | |-- dashboard_cvqkdtelemetry.json | |-- dashboard_logs.json | |-- dashboard_node_exporter.json |-- Dockerfile |-- grafana-config.yml |-- grafana-provisioning.yml

These are the contents of the listed configuration files:

Dockerfile: Specifies the base Grafana image and sets up the environment by copying the Grafana configuration and dashboard provisioning files into the container. It essentially defines how the Grafana server should be built within a Docker container.

FROM grafana/grafana:9.5.6 COPY grafana-config.yml /etc/grafana/provisioning/datasources/grafana-config.yaml COPY grafana-provisioning.yaml /etc/grafana/provisioning/dashboards/grafana-provisioning.yaml COPY dashboards /etc/grafana/provisioning/dashboards/

grafana-config.yml: Contains the data source configurations for Grafana, specifying how Grafana should connect to metrics databases like Prometheus and logs databases like Loki. It includes details such as the database URL, access method, and any other necessary configuration details for connecting to the data sources.

apiVersion: 1 datasources: - name: Prometheus type: prometheus access: proxy url: ${PROMETHEUS_URL} jsonData: timeInterval: "5s" - name: Loki type: loki access: proxy url: ${LOKI_URL} jsonData: maxLines: 1000 httpMethod: POST

grafana-provisioning.yml: Outlines the dashboard provisioning settings, detailing how Grafana should automatically load and update dashboards from specified files or directories. This file helps in automating the setup of Grafana dashboards and data sources, making the deployment reproducible and consistent.

apiVersion: 1 providers: # <string> an unique provider name. Required - name: 'provisioning-sys' # <int> Org id. Default to 1 orgId: 1 # <string> name of the dashboard folder. folder: '' # <string> folder UID. will be automatically generated if not specified folderUid: '' # <string> provider type. Default to 'file' type: file # <bool> disable dashboard deletion disableDeletion: false # <int> how often Grafana will scan for changed dashboards updateIntervalSeconds: 10 # <bool> allow updating provisioned dashboards from the UI allowUiUpdates: false options: # <string, required> path to dashboard files on disk. Required when using the 'file' type path: /etc/grafana/provisioning/dashboards # <bool> use folder names from filesystem to create folders in Grafana foldersFromFilesStructure: true

Using Grafana’s native alerting system for proactive monitoring automates anomaly detection and enables timely responses to critical events and issues. It is also recommended to decouple the default SQLite Grafana local database to an external database such as MySQL, to be able to restore the database quickly and easily in the event of any failure. There are tools such as database-migrator to achieve this. We used the Amazon RDS service as detailed earlier.

Time series database

A reliable time series database is essential for storing historical data and metrics generated by QKD devices. For our system, we used Prometheus, a time series database that operates with custom and third-party exporters to handle logs and system metrics efficiently. This combination allowed us to query time series data focused on dynamic and recent data and metrics generated by our devices, as well as monitor and analyze detailed logs and system metrics in real-time. Prometheus’ flexibility and ability to adapt to different types of data makes it a valuable tool for a comprehensive time data monitoring and storage strategy.

Regarding third-party exporters, Prometheus’ Node Exporter exposes a wide variety of hardware- and kernel-related metrics, while its preconfigured dashboard simplifies deployment significantly. It is maintained as part of the official Prometheus GitHub organization, which makes it a desirable choice for the enterprise and production environments.

prometheus_config.yml: Holds the configuration schema and ensures that Prometheus collects data across the infrastructure for monitoring and alerting purposes.

global: scrape_interval: 15s evaluation_interval: 15s scrape_configs: - job_name: prometheus static_configs: - targets: ["prometheus_dns:9090"] - job_name: node-exporter static_configs: - targets: ["node_exporter_dns:9100"] - job_name: custom-exporter static_configs: - targets: ["node_exporter_dns:$custom-exporter-port"]

Grafana provides out-of-the-box support to Prometheus, which makes it an optimal fit for NOVA LQ CV-QKD needs. The compatibility between Prometheus and Grafana is widely recognized in the industry, making this combination an effective and well-established choice for comprehensive monitoring and visualization solutions. As a worldwide popular set, it grants us reliable and optimized experience for our monitoring and supervision requirements with detailed documentation and solid online support.

You can also include an alerting system through Prometheus Alertmanager in this layer of the system by adding the following lines to the same file:

# Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: # - alertmanager:9093

Log management

Log management tools are essential for collecting, storing, and analyzing logs generated by QKD devices. They assist in troubleshooting issues and maintaining security. Open-source options like Opensearch offer a powerful log management and analysis capabilities. There are also commercial alternatives.

Opting for a custom logging in code proves to be a compelling option for organizations looking for a more personalized approach to records management. For NOVA LQ CV-QKD, we actively sought this custom logging strategy. We adapted our monitoring approach by directly instrumenting the application to expose metrics to Prometheus and visualize them through Grafana. This provides flexibility in gathering specific metrics based on custom requirements but could add processing overhead and require more specific development and dependencies.

To enhance system log management, our infrastructure incorporates Loki. This widely recognized tool increases syslog capabilities and contributes to efficient log aggregation, searching, and visualization. Loki is easy to deploy and manage, making it an essential component of our stack.

Figure 3 – In this figure, we can find the post-processing logs displayed in a real-time panel. We can observe a warning indicating that the classical channel has no connectivity. As a result, the system is put on hold and a retry is attempted. If the connectivity error is resolved, the service is automatically resumed. Otherwise, with the assistance of centralized logs, we can detect and track the time of failure.

Figure 3 – In this figure, we can find the post-processing logs displayed in a real-time panel. We can observe a warning indicating that the classical channel has no connectivity. As a result, the system is put on hold and a retry is attempted. If the connectivity error is resolved, the service is automatically resumed. Otherwise, with the assistance of centralized logs, we can detect and track the time of failure.

Extracting metadata from the NOVA LQ system

The NOVA LQ system generates a wealth of metadata through its control plane, the component responsible for managing the non-sensitive aspects of QKD. This metadata provides critical insights into the device’s performance, stability, and security, which are essential for monitoring and optimizing its operation. Key telemetry includes:

Excess Noise (snu): This parameter refers to all untrusted sources of noise in a QKD system above the calibrated shot noise (quantum noise) and electronic noise. An increase in this parameter is attributed to the presence of an eavesdropper. In QKD, eavesdropping would result in an increase in the excess noise. This is measured in Shot Noise Units.

Secret Key Rate (kbps): The secure key rate quantifies the number of secure bits generated per unit of time. It is a critical KPI for evaluating the practicality of the QKD system in real-world applications. The units are kilobits per second.

Transmittance: Quantum signals can experience loss as they travel through optical fibers. Evaluating the channel transmittance helps assess the system’s ability to maintain quantum states over distance, and it is a parameter that is also needed for the security proof in CV-QKD. The transmittance is typically expressed as a fraction value without units, and, in our representation, besides the channel losses, it also includes the detection efficiency, which is associated with the internal losses of the receiver module. The optical fiber losses in decibels (dB) can be directly obtained from the transmittance.

Post-processing Frame Error rate: Post-processing involves error correction and privacy amplification to enhance the security of generated keys. Efficient post-processing is essential for producing secure keys from raw data. The frame error rate is the ratio of incorrectly decoded data frames to the total number of transmitted frames during the QKD postprocessing. This parameter impacts the secret key rate.

Environmental Factors: Considerations such as temperature, humidity, and other environmental conditions can affect the performance of QKD systems. Monitoring these factors is essential for maintaining system integrity. The KPIs monitored by the systems are temperature in degrees Celsius (°C) and humidity (in percentage).

Device Stability: The stability is related to interruptions in the generation of secret keys; frequent interruptions impact the overall performance of the system and can prevent applications from extracting keys from the QKD devices. The stability is typically inferred from the time-series of metrics such as secret key rate, excess noise, and transmittance.

Uptime:  This metric is associated with system stability and represents the amount of time the system is operational and available without interruptions. It can be aggregated in Grafana for visualization over time intervals ranging from seconds to months.

Integrating these metrics into routine monitoring and management practices enables LuxQuanta’s customers to proactively manage their QKD devices, ensuring they operate at peak performance and security levels within the larger monitoring ecosystem.

This approach adds a classical management layer that enhances the reliability of the QKD devices, and aligns with the automated deployment of secure communication technologies.

Insights in action: a time series example

As we have already shown in previous figures and sections, monitoring plays a crucial role in verifying the status of our systems and services. Effective real-time monitoring allows us to identify critical moments of failure. In the following figure, for instance, we observe how a system that seemed stable at first shows some instability between 8 and 12 AM.

Figure 4 – Excess noise variance (SNU) and transmittance panels, where we can observe the moment when the system exhibited instability.

Figure 4 – Excess noise variance (SNU) and transmittance panels, where we can observe the moment when the system exhibited instability.

This view allows for debugging the system within the displayed time intervals by considering the physical or logical components of the different layers that may be influencing instabilities. Each panel can be configured to display different metrics, providing a comprehensive overview of the system’s state.

The provided example is a basic scenario where we validate the state of a system in a simple context. However, the true power of using cloud technologies becomes apparent when we consider their impact on centralized data collection and monitoring for multiple systems. We can easily scale our infrastructure to handle large volumes of data without the need for significant upfront investments in hardware, ensuring the performance and reliability of our services as our data needs grow.

A real-case scenario deployed in our laboratories involved six different QKD systems, connected in the same network, with real-time analytics and insights displayed on a centralized Grafana server. By using AWS cloud technologies, we were able to centralize data collection from all six systems, providing a unified and comprehensive view of system performance. AWS provides us the ability to scale infrastructure effortlessly as our needs grow, and to automate our monitoring tasks. This scalability helps us to handle increasing amounts of data without compromising performance, something more difficult to achieve with traditional on-premises solutions.

Figure 5 – Centralized monitoring for the six systems, including excess noise, transmittance, secret key rate, Last secret key rate (SKR), and losses. This example highlights how real-time monitoring can improve the understanding and operation of complex systems such as the QKD, ensuring their reliability and efficiency in practical environments.

Figure 5 – Centralized monitoring for the six systems, including excess noise, transmittance, secret key rate, Last secret key rate (SKR), and losses. This example highlights how real-time monitoring can improve the understanding and operation of complex systems such as the QKD, ensuring their reliability and efficiency in practical environments.

Conclusion

In this post, we shared the integration steps required to set up NOVA LQ CV-QKD system’s monitoring, alerting, and management using the AWS CDK and its associated services. You learned how to build a modular and layered infrastructure, extract and store QKD metadata, and configure dashboards to monitor the health of your systems in aggregation.

For LuxQuanta, this capability translates into enhanced operational effectiveness, faster identification and resolution of issues, and a robust foundation for scaling our quantum cryptographic solutions. Using real-time insights and scalable cloud infrastructure, LuxQuanta is better positioned to innovate and maintain a competitive edge in the rapidly evolving field of quantum communications.

If you wish to dive deeper into QKD, LuxQuanta provides the full integration package as a suite of professional services available through the AWS Marketplace. Furthermore, as a member of the AWS Partner Network, we further support a range of AWS service integrations. You can check more details in our partner profile page.

Read Entire Article