How NetApp Helped NVIDIA Optimize Their Helix Environment

NVIDIA Corporation uses Perforce Helix not only to manage its software development process and chip designs, but also to provide change control for critical documents company-wide. This technical case study describes a deployment of Perforce Helix running on clustered Network Appliance™, storage configured to provide simultaneous Fibre Channel SAN and network-attached storage (NAS).

Optimized Development With Perforce Helix and NetApp

icon-benefits-automation

Automating repetitive tasks saving valuable time

Faster, more reliable backup solutions

benefits-qualitycontrol

Complete disaster recovery functions

Now it's easy for us to accomplish common storage tasks like volume expansion and backup.

We don't have to pay someone else a fortune to do it."

 

Development Environment at a Glance

  • Company name: NVIDIA Corporation
  • Headquarters: Santa Clara, Calif.
  • Industry: PC Graphics
  • Number of users: 1,700
  • Total storage: more than 400 TB

 

Background

A global leader in advanced graphics processing technology, NVIDIA Corporation has received more graphics awards from the PC industry than any other company. Companies worldwide choose NVIDIA graphics processing units to enhance the digital media experience on desktops, workstations, notebooks, handhelds, and other devices. Headquartered in Santa Clara, Calif., NVIDIA has over 2,000 employees worldwide with revenue approaching $2 billion annually.

Over 800 NVIDIA engineers work on chip design and software development. The output from these groups is stored and managed using Perforce Helix Core. In addition to engineering data including all final chip designs and software source trees, other departments use Helix Core to store critical documents. In total, NVIDIA has over 1,700 Helix Core users — 85 percent of the company.

Ensuring high performance and availability of this critical business application is a key priority for the NVIDIA IT department. NVIDIA's original Perforce Helix Core infrastructure used a pair of Sun servers configured with third-party clustering software and a monolithic storage system from a major storage vendor. As NVIDIA's processing and storage requirements expanded, the IT team found this fixed architecture was expensive, complicated, and difficult to manage.

The highly complex storage environment on which Helix Core was deployed made something as simple as changing configurations a coordinated effort that often required bringing all three vendors on site. Although NVIDIA's storage administrator manages all of the engineering group's storage needs — including over 40 NetApp storage systems with more than 400TB of total capacity, NVIDIA found that the single system supporting its Helix Core environment was taking up most of the administrator's time.

Based on its long-term relationship with NetApp, NVIDIA decided to replace its existing storage solution with an easy-to-manage cluster of NetApp, fabric-attached storage (FAS) systems capable of simultaneously supporting both Fibre Channel SAN and network-attached storage protocols. Key goals were to reduce the administrative burden while providing high availability and disaster recovery capabilities.

 

Unified Storage Configuration

Perforce Helix Core is designed with a client/server architecture. The Perforce server maintains a centralized repository while client workspaces reside on network storage. Helix Core tracks all client workspaces, as well as the evolution and contents of its repository, through a central metadata database.

Since high availability was a critical requirement, extreme care was taken to ensure redundancy at all levels for the Helix Core environment. The original Sun server cluster was already configured for high availability, and two Brocade Fibre Channel switches were used to create a redundant SAN fabric between the Sun servers and the NetApp FAS cluster. Sun servers are configured with dual Fibre Channel HBAs, while each FAS system has four HBAs to provide optimum performance and multipath reliability.

Within the NetApp cluster, each FAS system has different responsibilities under normal operation. One system, designated P4SAN, has primary responsibility for 4TB of SAN storage used by the Perforce Helix Core metadata database. The second, P4NAS, has primary responsibility for 16TB of network-attached storage, which is directly accessible from NVIDIA's IP networks and used for client workspaces. Should either member of the NetApp cluster fail, the surviving member immediately assumes that workload in addition to its own.

The P4SAN system was initially configured with four 250GB LUNs, yielding 1TB of capacity to accommodate the central Helix Core repository. The P4NAS system is configured with a single large volume with 11TB total capacity. In both cases, underlying RAID groups consist of 16 disks and dual parity RAID (RAID-DP) is used to provide protection from multiple disk failures in a single RAID group.

With RAID-DP, each RAID group is allocated an additional parity disk. With this additional protection, the possibility of data loss due to a double disk failure has been eliminated and therefore larger RAID group sizes can be utilized for more flexible and simplified disk configuration options.

 

Accelerated Backup

A key benefit of the new storage solution is accelerated backups. Every four hours a Snapshot copy is created of changed data on the P4NAS system and data is asynchronously mirrored to a local NearStore® system using NetApp SnapMirror® software. The NearStore system is periodically backed up to NDMP tape using VERITAS NetBackup software.

The Perforce servers mount P4NAS and store database checkpoints and journals there, so this critical information is backed up. The database stored on P4SAN is backed up nightly to tape and then restored to a staging area where journals are replayed to verify the integrity of the dumps.

 

Disaster Recovery

The NetApp storage cluster and SnapMirror software also gives NVIDIA the ability to provide complete disaster recovery for its Helix Core data. NVIDIA maintains a disaster recovery facility in Sacramento, Calif., about 100 miles from its main facility in Santa Clara. Helix Core data is asynchronously mirrored to NetApp NearStore systems in Sacramento for protection from site-wide or regional disasters.

 

NetApp Impact

"We wanted to ensure that the new environment would be simple and fault tolerant and that the backend storage would not be a bottleneck," says Kelly Alexander, manager of Engineering Services for NVIDIA. "NetApp helped us achieve those goals.

"Our old system was very expensive to maintain," continues Alexander. "The maintenance cost alone far outweighed the purchase price of our NetApp solution."

Since the rest of NVIDIA's engineering environment already uses NetApp storage, the benefits for Rod Hernandez, NVIDIA's storage administrator, were substantial, "I was already very familiar with NetApp storage, and because the SAN piece is simple it only took a few hours of training for me to learn how to manage that, too. Standardizing on NetApp means I don't have to worry about Helix Core storage anymore. My life is much improved and many sleepless nights have been avoided since we upgraded to NetApp."

"Now it's easy for us to accomplish common storage tasks like volume expansion and backup. We don't have to pay someone else a fortune to do it. Not only does NetApp save NVIDIA a lot of administrative time, it's saving us a lot of money that we used to spend on professional services," adds Alexander. "NetApp just does what it's supposed to do without a lot of hassle. It gives us great availability, better data protection, and disaster recovery — all with truly lights-out operation."

 

Conclusion

With its most critical company data on NetApp storage, NVIDIA puts a lot of faith in NetApp technology. NVIDIA's new storage environment for Perforce Helix Core provides tremendous benefits for the company in terms of improved availability, data protection, and disaster recovery. NVIDIA depends on NetApp for the innovative storage solutions that allow it to continue to be successful.

See for yourself how Helix Core can help you.

Try Helix Core