SHARE
Facebook X Pinterest WhatsApp

PR: Louisiana Tech’s RAS-ware Runtime Breakthrough in HPC Clusters

Written By
thumbnail
Web Webster
Web Webster
Nov 3, 2005

Louisiana Tech’s eXtreme Computing Research (XCR) unveiled a
breakthrough development today in the RAS-ware runtime for
transparent job queue fault tolerance in HPC Cluster
environment.

Dr. Box Leangsuksun, an associate professor in computer science,
explained that XCR’s breakthrough consists of High Availability,
Self-configuration, and Self-healing as enabling solutions. His
group of graduate students, led by Anand Tikotekar and Kshitij
Limaye, has implemented a proof-of-concept Beowulf cluster based on
HA-OSCAR 1.1 and standard HPC resource management/job queue system
(e.g., PBS/TORQUE). Preliminary results suggest that MPI jobs can
continue their execution and job queue is preserved regardless of
failures at the head node and compute nodes.

The experiment runs standard MPI jobs without any modification
under LAM/MPI 7.0. The breakthrough handles both running and queued
jobs transparently and the queue order is even maintained in the
face of a catastrophic failure. The open source HA-OSCAR multi-head
solution provides failover capability and transparently recovers
the job queue in a head-node outage event.

“This is very exciting for us,” said Leangsuksun. “This marks a
major milestone in our overarching goal–toward non-stop services
in an HPC environment. We expect that our breakthrough technology
is exactly what the community has been waiting for.”

Leangsuksun continued, “Our breakthrough is also expected to be
part of the next HA-OSCAR release that will have broad impacts in
HPC and telecomm cluster environments, especially for mission
critical applications.”

This RAS-aware runtime breakthrough was a result of the MOLAR project under collaboration
between Louisiana Tech’s eXtreme Computing Research (XCR) group and
the Network and Cluster Computing (NCC) group at Oak Ridge National
Laboratory (ORNL).

thumbnail
Web Webster

Web Webster

Web Webster has more than 20 years of writing and editorial experience in the tech sector. He’s written and edited news, demand generation, user-focused, and thought leadership content for business software solutions, consumer tech, and Linux Today, he edits and writes for a portfolio of tech industry news and analysis websites including webopedia.com, and DatabaseJournal.com.

Recommended for you...

TUXEDO Stellaris 16 Gen6 Linux Laptop Unveiled as High-End Desktop Replacement
Marius Nestor
Oct 11, 2024
Valkey 8.0 Launches with Promising Enhancements in Speed and Efficiency
Bobby Borisov
Sep 24, 2024
12 Best Free and Open-Source Linux Renderers
webmaster
Aug 27, 2024
Kill a Process Running on a Specific Port in Linux (via 4 Methods)
Benny Lanco
Aug 2, 2024
Linux Today Logo

LinuxToday is a trusted, contributor-driven news resource supporting all types of Linux users. Our thriving international community engages with us through social media and frequent content contributions aimed at solving problems ranging from personal computing to enterprise-level IT operations. LinuxToday serves as a home for a community that struggles to find comparable information elsewhere on the web.

Property of TechnologyAdvice. © 2025 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.