---

OpenNMS Update v2.10

Date: Tue, 6 Mar 2001 21:51:58 -0600 (CST)
From: announce-admin@opennms.org
To: announce@www.opennms.org
Subject: [OpenNMS-Announce] OpenNMS Update v2.10


OpenNMS Update


Vol 2., Issue 10


March 6, 2001


In this week’s installment…

     * Project Status
          + Challenging Week
          + New Releases Pending
          + Office Move - Still Pending
          + Coding Projects Underway
     * Upcoming Road Shows
     * Early Adopter Program Status
     * The Wish List

Project Status


Challenging Week:

So you think you’ve had router problems…

Last week, our web site/email server/CVS tree were not
accessible for about 7 hours on Thursday afternoon. Since
everything is hosted in Kansas City (at our parent company’s co-lo
site), it makes it difficult for us to do much hands on
troubleshooting. So, whenever things go down, we take the following
steps:

* Blame BellSouth. We’re on xDSL from our current offices and
with the stunning reliability of their service offering, this is
usually a pretty safe bet. It all goes back to Occham’s
Razor…

* Wait either 5 minutes or reboot the xDSL modem and/or our NAT
gateway.

* If the rest of the world is reachable except for our stuff,
ping www.atipa.com. Since they are co-lo’ed at the same site, it’s
a good check to see if it is just us or a connectivity issue.

* If it’s not just us, call Jay. Our fearless (and remarkably
talented) local sys-admin in KC.

* If it is just us, call Jay. He usually doesn’t do have to do
anything, but he calms us down and the problem is generally cleared
by the time we’re off the phone. We chalk it up to “black magic”.
Jay chalks it up to PEBKAC. (Problem Exists Between Keyboard And
Chair)

* If all else fails, whine to Ben.

Well this week, while following these instructions, we hit two
snags we hadn’t encountered before. We couldn’t get to our
equipment or the Atipa gear either. So call Jay. Jay wasn’t there.
He was at another Atipa facility in New Hampshire. Thus, screwed we
were. But, we didn’t yet know to what degree.

So, we then call Jay’s back-up in KC. After some serious
exhaustion of ideas and what could possibly be the problem, our
crack team of technicians note something peculiar–the router is
gone. Not gone as in “not working”, or even gone as in “smoldering
in a heap in the corner”. This particular “gone” is as in stolen.
Snatched. Pilfered. An extremely neat and tidy little 1U rack space
opened up for us involuntarily by a third-party technician. It was
gone.

Fortunately, upon noticing this trifling bit of technical
detail, we were able to call in the local router-jock-for-hire
company (who did a great job under odd circumstances, but whose
name I currently have forgotten), and they configured up a spare
which was hidden in a somewhat less conspicuous spot. Considering
the fact that we were shy a router and we had to involve a
hired-gun router jock at the last minute, I figure 7 hours really
isn’t that bad. Hell, Mike’s broken the build for longer than that,
and more than once!

Anyway, that was the excitement for last week. Well, part of
it…

This week’s excitement technically started last Friday, when
BellSouth installed our new phone lines at our new digs. Since none
of us have actually been there yet, I’m going to give them the
benefit of the doubt and assume that they are installed, up and
working. Why give them this benefit? Only because they so
thoroughly proved their efficiency at disconnecting our current
phone and xDSL connections in our current office space.

Evidently, every company that ever moves always moves their
service. They never install new service at a new facility and leave
their existing service in place to allow for overlap.
Evidently.

So we find out that the data service is down late Friday
evening. Since it is down with such shocking regularity, we didn’t
think much of it. When it was still down all day on Saturday, we
called in to BellSouth FastAccess support (1-888-321-2375, ask for
Sherard and tell him Shane sent ya) and were told that BellSouth
was “experiencing problems in the Raleigh area”. So, we thought
nothing more of it. Until Sunday, when it still wasn’t up, at which
point it was easier to assume we just needed to reboot the modem
than to actually drive in and verify it. Bad move #1.

When we got in Monday morning, we found out that both data and
voice lines had been disconnected. The FastAccess billing folks
told us that what was a T-order (To:) had been completed as a
T&F-order (To: and From:), meaning that our service now existed
at the new office. Which is great for the painters and
carpet-layers who are currently occupying that space. We, in the
meantime, were officially screwed, with a big ol’ BellSouth seal of
approval.

So the lady at BellSouth billing who I had to deal with (who was
very pleasant and was trying to help) actually got our phone line
repair expedited and said that she’d turn the xDSL over to the xDSL
support group. At this point, I had no idea how badly I would be in
need of an xDSL support group…

Shane: Hi, my name is Shane, and I’m an xDSL user.
Chorus: (in unison) Hi, Shane!

So, 3:30pm Monday rolls around and finally–dial-tone! Sweet,
sweet dial-tone. And no xDSL.

So, I call BellSouth FastAccess support and talk to someone who
says that they can’t put xDSL service on that line because there is
a disconnect order on that line. So I ask them how they can take
that order off, and they tell me they can’t, without disconnecting
the line, of course. Well, if that’s not an option, we’ll have to
re-provision xDSL, and that will take a minimum of another 4 days.
This is despite an earlier rep telling me that it was just a matter
of “turning it on” once the phone service was in.

Unsettled and not content with that answer, I hang up and call
right back and get someone who knows what they are doing. Or at
least that’s what I assumed, since they were telling me what I
wanted to hear. After 2.6 hours on hold (check the logs), and me
having them call me back on my cell so I could at least go home, my
man Sherard calls me back and says no sweat, all I’ve got to do is
call in the next morning at 8am and tell billing to turn it on.
Piece of cake. So until 8am, I drink hard and sleep well.

The day is now Tuesday. The time is 7:45am and I’m in my car on
the way to work and anxiously watching the clock on my dashboard so
I can call in right at 8am. At 7:59, I start the process, so as to
allow time to get through the menus. 8am hits, and I’m on hold
again, being advised that an operator would be with me in less than
one minute. 2 minutes later (but I’ll let it slide), I start my
shpiel again. Finally, I’m advised that whoever told me that all I
had to do to get my service back up today was to call in just
didn’t know what they were talking about! I truly hope Sherard is
reading this to hear just how quickly his co-workers turned on him.
They tell me that the best they can do is 4 days. So I play that
ultimate trump card–“Can I speak with your supervisor?”

Mitchell hops on the phone roughly 5 minutes later totally
unapprised of my situation, so again I roll with the shpiel. He
confirms to me that Sherard (without mentioning a name) was
certifiably nutso, but that he’d try to help. So he put me on hold
for 15 minutes while trying to reach xDSL provisioning, who told
him we couldn’t do anything until we cancelled our service
cancellation order (the “F” in T&F), which we never placed in
the first place. So to make everyone happy, we cancelled our
unordered cancel order. And now I’m getting dizzy.

Finally, they say “hopefully, sometime today”. I then get to the
office to find Ben on the phone with them as well. After trying the
whole thing from another angle, we end up with the same answer. We
then focused on getting a dial-up connection from our NAT gateway
up so we could at least get email. God bless wvdialconf, possibly
the greatest utility ever written for the Linux platform. We were
soon up and running, but using the only dial-up line in the office,
and the line that BellSouth was going to call back on. Nonetheless,
we went forward and dared them to call us.

After another full day of being down (except we now had a shared
dial-up connection and an rsync-ed copy of CVS so we could work), I
left at 5pm (?!?!?) to write this from home.

Anti-climactic Conclusion: I just checked and the connection is
now up. The only remaining questions are: how much is BellSouth
going to charge us for a disconnect and an expedited re-connect,
and how much is Mindspring going to charge me for leaving my
dial-up account nailed up all day. And of course the ultimate
question: Do the lines at the new office really work?

Current odds around the office are 8:1 against it, and I’m
taking as much of that action as I can get. And besides, if they
aren’t up, I might at least get to break the news to Sherard about
his fickle co-workers…

New Releases Pending:

Effective with some successful merging of development branches
which began today, we anticipate a 0.7.1 release as early as Friday
of this week, which will be based on tomorrow night’s CVS
snapshot.

Pending successes there, we’ll likely have a 0.6.2 stable
release, including RPMs, available sometime early the following
week.

These new releases will include some bug-fixes and more Web UI
functionality. They’ll be worth the download for the bug-fixes, but
you’ll stay for the functionality.

Watch for these later on. They’ll also be announced on
Freshmeat, as always.

Office Move – Still Pending:

Some minor construction still going on, as well as questionable
data service. The furniture is supposed to be delivered on the 20th
or so, and we’ve got some hardware that’s supposed to show up not
too long after that, so we’ll probably be in later that week.

The offices are going to be a refreshing change of scenery for
us. It gets old when the most interesting part of your work
environment is the fact that you are between the high school and
the mall. We can’t count on power, and we sure can’t count on phone
service, but we can bank on the steady stream of high school kids
showing us a) their baggy pants, b) their cool tattoos, c) their
hair cuts (tres chic), or d) just how old we really are.

What will we miss about the current offices? Oh, there’s plenty.
For example, the family of ladybugs that lives in the window track
(yes, we work in a building where the windows actually open!). And
of course, the rust-stained ceiling tiles in the second floor of a
third-story office building. How’d they get stained? Your guess is
as good as mine. And I would be remiss not to mention old Mr.
Chalky, the faint chalk outline on the floor of the prior office
resident.

But enough of my rantings, don’t we actually work for a
living?

Coding Projects Underway:

* Snort Integration — Initial design work is underway, with
some pre-alpha functionality demo’d in Perl. Need to do some
serious nuts-and-bolts analysis of this integration before
proceeding. Still very early in this effort.

* Solaris Port Postgres Procedures — Underway. No update.

* Postgres for NT — As far as we know, this will work, but we
still haven’t heard back definitively from someone who has tested
it. There are some additional hurdles to jump for the Win32
platform, now that we have a dependency on a portmap service for
NT…

* Portmap for NT — There is one that ships with NT/2000 that
_should_ work, but we haven’t tested it. There is another one
referenced at http://www.plt.rwth-aachen.de/ks/english/oncrpc.html
which is basically from the same project as the Java RPC libraries
we are using. This is probably worth a look for those of you
interested in running on NT.

* SNMP Poller/Data Collection — The Web UI is alive, and we are
talking about some tweaks to the default RRD formats. Thoughts on
this? Let us know.

* Event DTD — Changed yet again.

* User Interfaces — Some bug fixes are in. Others pending.
Larry’s still adding features/functionality to the Web UI.

* SCM UI — Replaced with “./opennms.sh scm status”

* LDAP Poller — We’re in the infancy of this one. If you want
in, let me know.

* Maji Prelim Work — Rick is building Perl code that is
successfully parsing MIB files. Check him out, in all his glory, on
the “events” list.

* Notification Configuration — Actively being moved to the Web
UI.

* Swing Interface — Fighting random oddities. Proceed with
caution.

* Discovery/CAPSD/Database Review — Revisiting the way
Discovery and capsd communicate, verifying that stuff is accurately
written to the database, and adding some maintenance functionality
we didn’t have previously. Mike’s the man!


Upcoming Road Shows


Hopefully, we’ll have to add a regular section on “Seeing
OpenNMS in Print”! If you aren’t on Network World Fusion’s email
list on network and systems management, you missed their article on
ten cool open source network management tools, which mentioned us.
Kind of a goofy article overall, but hey, it’s nice to see the name
in print.

There’s also a nasty rumor about this month’s issue of
Enterprise Linux magazine, but I’ll believe it when I see it…

On with the road shows…

* May 5th – Twin Cities LUG, Minneapolis, MN

* May 10th – Boulder LUG, Boulder, CO

* June 1st – NOVALUG BBQ!! Fire-eaters Unite!!

* June 2nd – Northern Virginia LUG (NOVALUG), Alexandria, VA

* June 11-15 – OpenView Forum 2001, New Orleans, LA

* July 23-27 – O’Reilly Open Source Convention, San Diego,
CA

For additional details on these appearances and others, check
out the web site at http://www.opennms.org/sections/opennms/events


Early Adopter Program Status


Jeff has had some minor successes. At one site, Jeff was
fighting with notification (which is in the product and works,
thank you very much), and was having some problems with false
outages. A few tweaked parameters that he hadn’t tweaked before
(maybe hadn’t SEEN before) and suddenly, we fixed that, but also
may have exposed a misconfiguration that could have been causing
other problems. Gotta love the minor wins!

We’ve added another site to the EAP program, and are getting
close to a saturation point. If you or your company may be
interested in participating, go to the web and fill out the form.
Luke and Jeff will be in touch.


The Wish List


Later this week, we’ll begin working with our first potential
contributor working under a government grant. That whole deal is
not yet final, so we’re not resting on those laurels yet, but hey,
if Uncle Sam wants to give us some money, all we need is an
ethernet jack in Cheney’s pacemaker, and we’ll do our best to help
out where we can.

Otherwise, on with the list…

* In the 0.6.x release (and CVS), checkout the TODO file

* More Data Collection configs wanted for the
DataCollection.xml

* Any interest in more TCP pollers? Let us know (or better yet,
build one yourself…)

* LDAP Poller

* nmap Poller (That idea came in via email this week. Cool!)

* Documentation and development your game? How about a white
paper on how to extend OpenNMS with custom pollers, custom configs,
and/or your own scripts/code.

* Testing on new, exciting platforms is always appreciated.
Somebody want to mess with the Cygwin port of our Postgres stored
procedures and see where we stand?

* Any additional help we can get proving our documentation
either right or wrong is appreciated. Thanks.

* Got any creative applications for OpenNMS that we haven’t
considered? Let us know!

* A Security analysis of OpenNMS?


Afterthoughts


Following the first section, I’m just about all ranted out. So
I’ll take this chance to catch up on the comments we’ve been
receiving regarding name resolution and dependencies between an NMS
and external DNS servers.

We’ve discussed several options, including running a cacheing
DNS locally (which still leaves you with dependencies on an
external DNS), creating our own local /etc/hosts file for exclusive
name resolution, built from an nslookup or zone transfer script
(kludgy at best), and then the solution we’re pretty much settled
on, which comes pretty close to Roger Zenker’s description.
Somebody buy that man a beer.

In brief, what we’re currently thinking about doing is resolving
the IP address to a name as we do the capabilities check on a node
at discovery time, and writing that name to the database, then
refreshing that name during the capabilities re-scan, which
happens, by default, on a 24-hour interval (but is configurable).
Of course, we’ll also provide a utility to force that change if you
need it to happen prior to the re-scan.

Your ideas were all helpful (except for the questions about how
we, Snort, and Samba use different files for syslogging ?!?!), and
all figured into the direction we’re currently pursuing. Thanks
again for the open discussion and stay tuned–there are plenty more
questions to come, like this one:

What’s the best algorithm for associating a name with a node?
DNS only associates a name with an IP address, which is associated
with an interface, and by definition, a node can have more than one
interface. So which is the right name to associate with it? We’re
familiar with OpenView’s algorithm, which seems reasonably good,
but are the situations that aren’t well-addressed by it (briefly,
it’s node name equals the first of whichever is available: SNMP
sysName, hostname for software loopback interface, or hostname of
interface with lowest-numbered IP address.) So, whaddaya think?
Please take your responses to the [discuss] list.

And as always, thanks for your support. Not you, Sherard.

XXXs and OOOs to BellSouth,

Shane O.
========
Shane O’Donnell
OpenNMS.org
shaneo@opennms.org


Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends, & analysis