Linear programming (LP) is one of the most impactful tools inoptimization and decision-making, widely applied across industries such as logistics, finance, energy, manufacturing, economics and engineering. Despite itsversatility in solving complex real-world problems, large-scale LP instances remain computationally challenging, even with decades of advancements in solver technology.
Traditionally, the two dominant approaches forsolving LP problems have been the simplex method and the interior-point methods (IPM). While the simplex method is effective for large, sparse LP problems, the interior-point methods are preferred for numerically difficult cases. However, both methods struggle with particularly large and complex LP instances.
A well-known example is the zib03 LP problem, which took over four months to solve in 2009 using the barrier method and remains intractable for the simplex method. Even with modern CPU-based solvers like COPT, solving zib03 to optimality still required about 16 hours in 2023, reinforcing the notion that certain LP problems are inherently difficult for both simplex and IPM.
But this belief no longer holds.
At the end of 2023, a major breakthrough was achieved with GPU-accelerated first-order methods. The same zib03 LP instance that took 16 hours on a CPU-based solver was solved to optimality in just 15
minutes using cuPDLP-C, an open-source, GPU-accelerated LP solver running on NVIDIA H100 GPUs, which was created by scientists from COPT, University of Chicago and Stanford University [1].
Solver | Hardware | Time to Optimality |
Barrier Method (2009) | CPU | 4 months |
COPT (2023) | Modern CPU | 16 hours |
cuPDLP-C (2023) | NVIDIA H100 GPU | 15 minutes |
Table 1. Hard LP instances solved more than 60 times faster with cuPDLP-C.
Alongside these advancements, a major milestone for the optimization community was announced—NVIDIA cuOpt is becoming open-source. This move marks a new era of accessibility, collaboration, and innovation in GPU-accelerated solvers, empowering researchers, developers, and industry practitioners to integrate cuOpt into custom workflows and accelerate real-world optimization problems at scale.
Additionally, the potential integration of NVIDIA cuOpt with the COPT mixed-integer programming (MIP) solver is another exciting development. In a recent collaboration with NVIDIA cuOpt team, a GPU-accelerated MIP solver was tested alongside the open-source HiGHS solver. The results were remarkable:
- 34% improvement in MIP solution quality within 5 minutes
- Reduction in the average optimality gap from 35.3% to 26.3%
This demonstrates how GPU-powered combinatorial optimization can significantly enhance real-time decision-making and large-scale problem solving. By integrating cuOpt with COPT, both teams have the opportunity to push the boundaries of real-time MIP solving and refine what’s possible in high-performance optimization.
A Third Path: GPU-Accelerated First-Order Methods for Linear Programming
Thesimplex method, invented in the 1940s, has been refined over the years and remains one of the most widely used LP solvers It operates by iteratively solving at least two linear systems per iteration and moves along the edges of the feasible region until it reaches an optimal corner solution. While highly effective for many LP instances, its performance depends on the sparsity of the problem—dense problems quickly become impractical due to their O(n^3) computational cost for solving linear systems. However, simplex does have an advantage in that it allows for an efficient warm start, making it useful for incremental problem modifications.
On the other hand, interior-point methods approach LP from a different perspective. Instead of iterating along the edges of the feasible region, they solve a linearized set of optimality conditions using Newton's method. These algorithms converge quickly, often requiring only 10s-100s of iterations to reach a high-quality solution. However, each iteration requires the factorization of a large matrix, which can become prohibitively expensive, especially for large-scale LP problems. While interior-point methods are powerful and easily extendable to convex optimization problems, they terminate at an interior of the feasible region, requiring an additional crossover step to obtain a corner solution.
A New Paradigm: First-Order Methods and GPU Acceleration
First-order methods break away from traditional reliance on matrix factorizations. Instead, they prioritize fast, lightweight iterations that involve sparse matrix-vector multiplications (SpMV) rather than solving large linear systems. While they typically require more iterations to converge compared to interior-point methods, each iteration is significantly more cost effective, making them well-suited for massive scale problems where matrix factorizations become impractical.
Why First-Order Methods Are a Game Changer
Computational Efficiency
- Traditional LP solversrequire O(n³) operations for matrix factorizations.
- First-order methods rely on sparse matrix-vector multiplications (SpMV), which scales with the number of nonzero elements (nnz) instead of n³, making them dramatically faster for sparse problems.
Parallelization and GPU Acceleration
- Sparsematrix-vector multiplications are inherently highly parallelizable, making them ideal for GPU acceleration.
- Unlike CPU-based solvers that struggle with large matrix factorizations, GPUs can perform SpMV operations orders of magnitude faster due to optimized AI-inspired architectures.
Scalability for Large LP Problems
- Many real-world LP problems involve millions or even billions of variables, making traditional solvers impractical
- First-order methods can efficiently handle high-dimensional and ill-conditioned problems that are otherwise infeasible with simplex or interior-point methods.
GPU-Accelerated PDHG for Linear Programming: A New Benchmark for Large-Scale Optimization
Primal-Dual Linear Programming (PDLP) is a specialization of the Primal-Dual Hybrid Gradient (PDHG) method tailored for solving linear programs efficiently. cuPDLP-c is an open-source, GPU-accelerated implementation of PDLP written in C language, designed to push the boundaries of large-scale optimization. The dramatic solution speedup observed for the zib03 LP instance—reducing solve time from 16 hours to just 15 minutes was achieved using cuPDLP-c on NVIDIA GPUs.
For years, some of the largest LP problems were ignored due to their intractability with traditional solvers, limiting the impact of optimization in data-intensive industries. However, our benchmarking on some of the most challenging LP models demonstrates that cuPDLP-C outperforms both simplex and interior-point methods for many large-scale instances, significantly expanding the applicability of optimization at scale.
Benchmarking cuPDLP-C: Real-World Performance Gains
To evaluate the efficiency of cuPDLP-C, Hans Mittelman recently benchmarked it on the NVIDIA A6000 GPU. In the benchmark comparison, cuPDLP-C (1e-4 tolerance) ranked third overall. Moreover, when testing against 16 undisclosed large-scale LP problems, cuPDLP-C was 3% faster than COPT, one of the top commercial LP solvers.
Source | Instance | m | n | nnz | COPT | cuPDLP-C |
Koch et al. | zib03 | 19,731,970 | 29,128,799 | 104,422,573 | 16.5 (h) | 916 |
Pagerank | rand_1m_nodes | 1,000,001 | 1,000,000 | 7,999,982 | - | 3.56 |
rand_10m_nodes | 10,000,001 | 10,000,000 | 79,999,982 | - | 44.22 | |
com-livejournal | 3,997,963 | 3,997,962 | 77,358,302 | - | 21.07 | |
soc-livejournal1 | 4,847,572 | 4,847,571 | 78,170,533 | - | 22.26 | |
Unit Com. | ds1 | 641,037 | 659,145 | 21,577,566 | 592 | 81 |
ds2 | 641,037 | 659,145 | 21,577,566 | 606 | 108 | |
Supply-chain | inv-10 | 4,035,449 | 3,758,458 | 15,264,380 | - | 1636 |
inv-20 | 8,368,795 | 7,810,584 | 31,718,673 | - | 1157 | |
inv-40 | 13,186,756 | 12,066,105 | 49,528,729 | - | 6032 | |
inv-60 | 16,227,780 | 14,544,689 | 60,372,404 | - | 11102 | |
QAP | wil50 | 3,437,600 | 6,252,500 | 19,125,000 | - | 96 |
lipa50a | - | 71 | ||||
lipa50b | - | 66 | ||||
tai50a | - | 64 | ||||
tai50b | - | 437 |
Table 2. Performance on selected problems.
MIPLIB 2017 Benchmarks: LP Relaxation Performance
Beyond solving standard LP problems, one of the most important applications of LP solvers is their role in mixed-integer programming (MIP). When solving an MIP instance, solvers must process tens to millions of LP relaxations, making LP solver efficiency crucial. One of the most important LP relaxations in MIP is the root relaxation, which significantly influences the total solution time.To test this, we evaluated the performance of PDLP on non-trivial LP relaxations from MIPLIB 2017, comparing it against simplex and interior-point methods.
The table below shows the solution time (in seconds) for solving LP relaxations of selected MIPLIB 2017 instances, comparing Simplex, Interior-Point Methods (IPM), and PDLP.
Instances | Variables | Constraints | Non-Zeros | Simplex[s] | IPM[s] | PDLP[s] |
Open Pit Mining over a Cube considering Multiple Time Periods | ||||||
rmine11 | 12292 | 97389 | 241240 | 4.6 | 2.8 | 4 |
rmine13 | 23980 | 197155 | 485784 | 8.6 | 4.8 | 5.9 |
rmine15 | 42438 | 358395 | 879732 | 34.2 | 14.2 | 7.8 |
rmine21 | 162547 | 1441651 | 3514884 | 902.4 | 215.5 | 36.1 |
rmine25 | 326599 | 2953849 | 7182744 | >3600.0 | 1988.2 | 384.5 |
Simplified Real-Life Satellite Scheduling Problem | ||||||
satellites2-60-fs | 35378 | 16516 | 125048 | 4.6 | 0.9 | 7.0 |
satellites2-40 | 35378 | 20916 | 283668 | 47.6 | 5.9 | 3.5 |
satellites3-25 | 81681 | 44804 | 698176 | 127.0 | 33.8 | 4.8 |
satellites4-25 | 95637 | 51712 | 821192 | 244.2 | 43.4 | 7.7 |
Trip Timetabling Problem | ||||||
triptim1 | 30055 | 15706 | 515436 | 9.8 | 3.7 | 11.2 |
triptim2 | 27326 | 14427 | 521898 | 81.4 | 4.1 | 12.0 |
triptim4 | 27226 | 14361 | 520532 | 54.8 | 3.7 | 7.6 |
triptim7 | 27342 | 14427 | 521944 | 78.5 | 4.9 | 14.8 |
triptim8 | 27342 | 14427 | 521945 | 74.2 | 5.7 | 20.8 |
Table 3. Solution time (seconds) of LP relaxation of MIPLIB 2017 instances, with Simplex, interior-point and first-order methods.
Key Insights from Benchmarking
- PDLP scales better than IPM for very large instances—In problems like rmine21 and rmine25, PDLP dramatically outperforms IPM as problem size increases.
- Satellite scheduling and trip timetabling results vary—PDLP is competitive but sometimes lags behind IPM, depending on the problem structure
- All solvers reach optimality—The final solutions across all methods have the same quality, making runtime comparisons meaningful.
Real-World Application: SCUC for Electricity Production Planning
To further test the practical impact of cuPDLP-C, we applied it to a Security-Constrained Unit Commitment (SCUC) problem, a crucial optimization model used in electricity production planning:
- Long-term energy planning requires solving SCUC models daily, incorporating vast amounts of operational data.
- Short-term adjustments demand near-real-time solutions, requiring SCUC models to be solved within 15 minutes intervals.
- The first LP relaxation in SCUC models often accounts for half of the total MIP solution time, making its efficiency a key bottleneck.
By combining cuPDLP-C with the simplex and barrier methods in COPT, we achieved a 20% average reduction in SCUC solution time, enabling faster, more efficient electricity production planning.
PDHG for General Conic Programs: Scaling Optimization Beyond LP
Conic linear programming (CLPs) emerged in the 1990s as a fundamental class of optimization problems, extending beyond traditional linear programming (LP). Among its key subclasses are second-order cone programs (SOCPs), exponential cone programs, and semidefinite programs (SDPs), all of which play a crucial role in modern optimization applications.
Historically, interior-point methods (IPM) have been the go-toapproach for solving CLPs. However, as discussed earlier, these methods rely on solving linear equation systems at each iteration, making them computationally prohibitive for large-scale problems. This limitation has hindered the scalability of conic optimization, particularly in finance, machine learning, and market equilibrium models.
Now, Primal-Dual Conic Solver (PDCS) extends PDLP to handle general conic linear programming problems, offering a GPU-accelerated alternative to traditional solvers. By leveraging customized rescaling and projection methods tailored for CLPs, Researchers from COPT, SJTU and MIT find that PDCS can efficiently handle a wide range of problem structures. Our benchmarking results demonstrate that PDCS consistently outperforms interior-point methods on our large-scale test cases, including the Fisher Market Equilibrium, Lasso regression, and Multi-Period Portfolio Optimization.
PDCS for Multi-Period Portfolio Optimization
Portfolio optimization is one of the most important applications of second-order cone programming (SOCP). Multi-period portfolio optimization extends traditional single-period models by considering investment decisions over multiple time periods, allowing for dynamic rebalancing to maximize long-term revenues. However, this approach significantly increases the problem size, making it computationally challenging for traditional solvers.
The table below compares solution times for a multi-period portfolio optimization with 844 securities, using COPT’s interior-point method versus the first-order solver PDCS.
Number of Time Periods | Variables | Constraints | Non-Zeros | IPM (COPT) [s] | PDCS[s] |
48 | 101808 | 410256 | 10891238 | 9.32 | 7.18 |
96 | 203616 | 820512 | 21782678 | 19.99 | 10.67 |
360 | 763560 | 3076920 | 81685598 | 117.74 | 51.30 |
720 | 1527120 | 6153840 | 163371398 | 369.50 | 70.82 |
1440 | 3054240 | 12307680 | 326742998 | >3600 | 491.04 |
2880 | 6108480 | 24615360 | 653486198 | >3600 | 580.87 |
Table 4. Solution time (seconds) of the SOCP instances of the multi-period portfolio optimization problem, using Interior point method (COPT) and the first-order method (PDCS).
Key Takeaways from Portfolio Optimization Benchmarking
- PDCS scales significantly better than IPM as problem size increases
- For large instances (1,440+time periods), IPM fails to converge within 1hour, whereas PDCS finds solutions in minutes
- GPU acceleration provides a clear advantage in handling massive SOCP problems
PDCS for Fisher Market Equilibrium
The Fisher Market Equilibrium problem is a fundamental optimization challenge in economics and resource allocation. It involves computing the optimal allocation of goods among buyers, balancing supply and demand in competitive markets. Traditionally, exponential cone programming is used to formulate and solve this problem, with the interior-point methods being the default approach.
However, as markets grow larger and more complex, solving these equilibrium problems becomes computationally prohibitive. PDCS introduces GPU-accelerated first-order methods to dramatically improve scalability, making it possible to compute equilibria for large-scale markets that were previously unsolvable.
The table below compares solution times for computing equilibria in Fisher Market problems of increasing size.
Number of Goods | Number of Buyers | Variables | Constraints | Non-Zeros | IPM (COPT)[s] | PDCS[s] |
10000 | 50 | 5.2*10⁵ | 4.0*10⁴ | 2.5*10⁴ | 3.3 | 3.5 |
100000 | 50 | 5.2*10⁶ | 4.0*10⁵ | 1.0*10⁶ | 243.3 | 14.1 |
100000 | 500 | 5.02*10⁷ | 4.0*10⁵ | 1.0*10⁷ | >3600 | 422.8 |
Table 5. Solution time (seconds) of the exponential cone programming instances of the Fisher Market Equilibrium problem, using Interior point method (COPT) and the first-order method (PDCS).
Key Takeaways from Fisher Market Equilibrium Benchmarking
- PCDP achieves near-identical performance to IPM for small-scale problems
- For large-scale markets (100K+ goods), PCDP outperforms IPM by orders of magnitude
- For massive instances, IPM fails to solve the problem within an hour, while PCDP delivers solutions in minutes
These results highlight the game-changing potential of GPU-accelerated conic solvers in economic modeling, making large-scale equilibrium problems tractable for the first time.
Scaling Semidefinite Programming with GPU-Accelerated First-Order Methods
Semidefinite Programming (SDP) is a powerful optimization framework with applications spanning machine learning, signal processing, combinatorial optimization, and control engineering. However, as problem sizes continue to grow, existing solution approaches face severe computational and memory challenges, making large-scale SDP problems increasingly difficult to solve efficiently.
Traditional interior-point methods (IPM) provide high-precision solutions but require constructing and factorizing large Karush-Kuhn-Tucker (KKT) systems at each iteration. This results in exponential growth in both computation time and memory usage, making these methods impractical for large-scale SDP instances. At the same time, while first-order methods such as augmented Lagrangian and alternating direction method of multipliers (ADMM) avoid expensiveNewton system factorizations, they still operate on the full n×n matrix, leading to rapid increases in computational overhead and memory demands as dimensions and constraints grow.
To address these issues, low-rank factorization has emerged as a breakthrough approach, drastically reducing the complexity of SDP problems by representing the high-dimensional matrix variable as the product of one or two low-rank factors. Among these methods, the Burer–Monteiro factorization has gained significant traction in recent years, demonstrating remarkable efficiency improvements in large-scale SDP.
GPU-Accelerated Low-Rank SDP: A New Frontier
Recent advances in GPU-based parallel computing have dramatically enhanced scalability of low-rank SDP solvers. By integrating low-rank factorization techniques with GPU acceleration, researchers from COPT, SJTU and UIUC have unlocked orders-of-magnitude improvements in both speed and memory efficiency for large-scale SDP problems with their open-source software cuLoRADS [4].
For instance, in MaxCut problems, matrix dimensions can now scale into the hundreds of millions, while in Matrix Completion tasks, the method can efficiently handle hundreds of millions of observed constraints. In both cases, computations complete within a few minutes. Even at such extreme problem sizes, GPU-accelerated solvers converge within seconds to minutes, whereas traditional CPU-based solvers either fail to complete in a reasonable time or run out of memory.
Benchmarking GPU-Accelerated SDP
- Speedups range from hundreds to tens of thousands of times compared to CPU solvers
- Scalability improves exponentially as problem dimensions increase
- Parallelized computation on GPU allows massive SDP problems to be solved in real-time applications.
The GPU acceleration is particularly effective because it aggregates numerous small-scale, sequential SDP computations into large-scale batch operations, making them highly efficient for GPU parallelism. Additionally, memory compression techniques for sparse structures minimize memory overhead, reducing redundant calculations and maximizing hardware utilization
GPU-Accelerated SDP: Bridging Quantum Chemistry and Quantum Computing
GPU-accelerated SDP methods are also transforming quantum computing, particularly in simulating complex electronic structures in quantum chemistry. Quantum simulations rely heavily on accurate representations of molecular Hamiltonians, which involve challenging semidefinite programming (SDP) problems. Traditional SDP solvers are limited by computational and memory constraints, making large-scale simulations impractical. However, recent GPU-based SDP algorithms have dramatically enhanced scalability, enabling the efficient handling of massive optimization tasks.
Several researchers [5] have leveraged GPU-accelerated SDP solvers to improve quantum algorithms significantly. Techniques like Spectrum Amplification (SA) and Double-Factorized-Tensor-Hypercontraction (DFTHC) require optimal SOS representations, achieved efficiently using cuLoRADS. For example, quantum resource requirements for simulating electronic structures—such as Iron-Sulfur complexes and CO₂-fixation catalysts—have been reduced by factors ranging from 4 to nearly 200 times compared to previous methods. These improvements directly enhance the practicality of quantum simulations, significantly reducing the computational complexity required to estimate ground-state energies.
In essence, GPU-accelerated SDP technology serves as a critical bridge, facilitating transformative advancements in quantum computing applications. It enables efficient quantum simulations of complex molecular systems, bringing previously intractable quantum chemistry problems within practical computational reach.
The Future of GPU-Accelerated Optimization: Scaling Innovation with NVIDIA cuOpt and COPT Collaboration
The potential ofGPU-accelerated optimization extends far beyond first-order methods, opening the door for faster, more scalable, and more efficient solvers across a variety of optimization domains. With NVIDIA cuOpt now open-source, researchers, developers, and industry leaders have greater flexibility to customize and integrate GPU-powered solvers into their workflows, accelerating real-time decision-making in industries such as logistics, finance, energy, economics, and operations research.
One of the most promising areas of GPU acceleration is mixed-integer programming (MIP), where combinatorial complexity often makes traditional solvers computationally prohibitive. Recent advancements in GPU-based MIP solvers, developed in collaboration with the NVIDIA cuOpt team, demonstrate significant progress in improving solution quality and reducing computational overhead. These improvements pave the way for more efficient, large-scale decision-making, making high-performance MIP solving more accessible across multiple industries.
Beyond MIP, we are also actively exploring GPU acceleration for interior-point methods. Unlike first-order methods, IPM relies on symmetric matrix factorization, an operation that is highly parallelizable on both CPUs and GPUs. While still in development, early results are promising— our GPU-accelerated barrier LP solver is already 20% faster on an NVIDIA RTX 4090 compared to the best CPU-based implementation. As we continue refining and optimizing this approach, we anticipate even greater performance gains, extending GPU acceleration to a broader range of mathematical optimization algorithms.
Looking ahead, with NVIDIA cuOpt now becoming open-source, COPT gains unprecedented opportunities to leverage the strengths of NVIDIA hardware and CUDA ecosystems. This will enable significantly greater GPU acceleration across a wider range of optimization algorithms, from first-order methods to more advanced techniques. As a result, GPU-powered solvers within COPT can be more effectively applied to diverse optimization scenarios including linear, mixed-integer, and nonlinear programming problems. The integration of cuOpt with the COPT solver marks an exciting step toward in real-time, large-scale combinatorial optimization. This collaboration is set to redefine performance standards in mathematical optimization, enabling breakthroughs in AI-driven operation research, industry automation, and intelligent decision-making. As GPU-powered solvers continue to evolve, they will play a critical role in tackling complex, large-scale challenges, pushing the boundaries of what’s possible in high-performance optimization.
Reference:
1. cuPDLP-C: A Strengthened Implementation of cuPDLP for Linear Programming by C language,H Lu, J Yang, H Hu, Q Huangfu, J Liu, T Liu, Y Ye, C Zhang, D Ge, arXiv preprint arXiv:2312.14832, 2023.
2. Restarted primal-dual hybrid conjugate gradient method for large-scale quadratic programming, Y Huang, W Zhang, H Li, D Ge, H Liu, Y Ye, arXiv preprint arXiv:2405.16160, 2024.
3. A GPU-Accelerated First-Order Algorithm for Second-Order Cone Programming, Z Lin, Z Xiong, D Ge, Y Ye, drafts, 2025.
4. Accelerating Low-Rank Factorization-Based Semidefinite Programming Algorithms on GPU, Q Han, Z Lin, H Liu, C Chen, Q Deng, D Ge, Y Ye, arXiv preprint arXiv:2407.15049, 2024.
5. Fast quantum simulation of electronic structure by spectrum amplification, GH Low, R King, DW Berry, Q Han, AE DePrince III, A White, R Babbush, R Somma, N Rubin, arXiv preprint arXiv:2502.15882