OpenNMS Update v2.10Mar 07, 2001, 08:19 (0 Talkback[s])
(Other stories by Shane O'Donnell)
Date: Tue, 6 Mar 2001 21:51:58 -0600 (CST)
Vol 2., Issue 10
March 6, 2001
In this week's installment...
* Project Status + Challenging Week + New Releases Pending + Office Move - Still Pending + Coding Projects Underway * Upcoming Road Shows * Early Adopter Program Status * The Wish List
So you think you've had router problems...
Last week, our web site/email server/CVS tree were not accessible for about 7 hours on Thursday afternoon. Since everything is hosted in Kansas City (at our parent company's co-lo site), it makes it difficult for us to do much hands on troubleshooting. So, whenever things go down, we take the following steps:
* Blame BellSouth. We're on xDSL from our current offices and with the stunning reliability of their service offering, this is usually a pretty safe bet. It all goes back to Occham's Razor...
* Wait either 5 minutes or reboot the xDSL modem and/or our NAT gateway.
* If the rest of the world is reachable except for our stuff, ping www.atipa.com. Since they are co-lo'ed at the same site, it's a good check to see if it is just us or a connectivity issue.
* If it's not just us, call Jay. Our fearless (and remarkably talented) local sys-admin in KC.
* If it is just us, call Jay. He usually doesn't do have to do anything, but he calms us down and the problem is generally cleared by the time we're off the phone. We chalk it up to "black magic". Jay chalks it up to PEBKAC. (Problem Exists Between Keyboard And Chair)
* If all else fails, whine to Ben.
Well this week, while following these instructions, we hit two snags we hadn't encountered before. We couldn't get to our equipment or the Atipa gear either. So call Jay. Jay wasn't there. He was at another Atipa facility in New Hampshire. Thus, screwed we were. But, we didn't yet know to what degree.
So, we then call Jay's back-up in KC. After some serious exhaustion of ideas and what could possibly be the problem, our crack team of technicians note something peculiar--the router is gone. Not gone as in "not working", or even gone as in "smoldering in a heap in the corner". This particular "gone" is as in stolen. Snatched. Pilfered. An extremely neat and tidy little 1U rack space opened up for us involuntarily by a third-party technician. It was gone.
Fortunately, upon noticing this trifling bit of technical detail, we were able to call in the local router-jock-for-hire company (who did a great job under odd circumstances, but whose name I currently have forgotten), and they configured up a spare which was hidden in a somewhat less conspicuous spot. Considering the fact that we were shy a router and we had to involve a hired-gun router jock at the last minute, I figure 7 hours really isn't that bad. Hell, Mike's broken the build for longer than that, and more than once!
Anyway, that was the excitement for last week. Well, part of it...
This week's excitement technically started last Friday, when BellSouth installed our new phone lines at our new digs. Since none of us have actually been there yet, I'm going to give them the benefit of the doubt and assume that they are installed, up and working. Why give them this benefit? Only because they so thoroughly proved their efficiency at disconnecting our current phone and xDSL connections in our current office space.
Evidently, every company that ever moves always moves their service. They never install new service at a new facility and leave their existing service in place to allow for overlap. Evidently.
So we find out that the data service is down late Friday evening. Since it is down with such shocking regularity, we didn't think much of it. When it was still down all day on Saturday, we called in to BellSouth FastAccess support (1-888-321-2375, ask for Sherard and tell him Shane sent ya) and were told that BellSouth was "experiencing problems in the Raleigh area". So, we thought nothing more of it. Until Sunday, when it still wasn't up, at which point it was easier to assume we just needed to reboot the modem than to actually drive in and verify it. Bad move #1.
When we got in Monday morning, we found out that both data and voice lines had been disconnected. The FastAccess billing folks told us that what was a T-order (To:) had been completed as a T&F-order (To: and From:), meaning that our service now existed at the new office. Which is great for the painters and carpet-layers who are currently occupying that space. We, in the meantime, were officially screwed, with a big ol' BellSouth seal of approval.
So the lady at BellSouth billing who I had to deal with (who was very pleasant and was trying to help) actually got our phone line repair expedited and said that she'd turn the xDSL over to the xDSL support group. At this point, I had no idea how badly I would be in need of an xDSL support group...
Shane: Hi, my name is Shane, and I'm an xDSL user.
So, 3:30pm Monday rolls around and finally--dial-tone! Sweet, sweet dial-tone. And no xDSL.
So, I call BellSouth FastAccess support and talk to someone who says that they can't put xDSL service on that line because there is a disconnect order on that line. So I ask them how they can take that order off, and they tell me they can't, without disconnecting the line, of course. Well, if that's not an option, we'll have to re-provision xDSL, and that will take a minimum of another 4 days. This is despite an earlier rep telling me that it was just a matter of "turning it on" once the phone service was in.
Unsettled and not content with that answer, I hang up and call right back and get someone who knows what they are doing. Or at least that's what I assumed, since they were telling me what I wanted to hear. After 2.6 hours on hold (check the logs), and me having them call me back on my cell so I could at least go home, my man Sherard calls me back and says no sweat, all I've got to do is call in the next morning at 8am and tell billing to turn it on. Piece of cake. So until 8am, I drink hard and sleep well.
The day is now Tuesday. The time is 7:45am and I'm in my car on the way to work and anxiously watching the clock on my dashboard so I can call in right at 8am. At 7:59, I start the process, so as to allow time to get through the menus. 8am hits, and I'm on hold again, being advised that an operator would be with me in less than one minute. 2 minutes later (but I'll let it slide), I start my shpiel again. Finally, I'm advised that whoever told me that all I had to do to get my service back up today was to call in just didn't know what they were talking about! I truly hope Sherard is reading this to hear just how quickly his co-workers turned on him. They tell me that the best they can do is 4 days. So I play that ultimate trump card--"Can I speak with your supervisor?"
Mitchell hops on the phone roughly 5 minutes later totally unapprised of my situation, so again I roll with the shpiel. He confirms to me that Sherard (without mentioning a name) was certifiably nutso, but that he'd try to help. So he put me on hold for 15 minutes while trying to reach xDSL provisioning, who told him we couldn't do anything until we cancelled our service cancellation order (the "F" in T&F), which we never placed in the first place. So to make everyone happy, we cancelled our unordered cancel order. And now I'm getting dizzy.
Finally, they say "hopefully, sometime today". I then get to the office to find Ben on the phone with them as well. After trying the whole thing from another angle, we end up with the same answer. We then focused on getting a dial-up connection from our NAT gateway up so we could at least get email. God bless wvdialconf, possibly the greatest utility ever written for the Linux platform. We were soon up and running, but using the only dial-up line in the office, and the line that BellSouth was going to call back on. Nonetheless, we went forward and dared them to call us.
After another full day of being down (except we now had a shared dial-up connection and an rsync-ed copy of CVS so we could work), I left at 5pm (?!?!?) to write this from home.
Anti-climactic Conclusion: I just checked and the connection is now up. The only remaining questions are: how much is BellSouth going to charge us for a disconnect and an expedited re-connect, and how much is Mindspring going to charge me for leaving my dial-up account nailed up all day. And of course the ultimate question: Do the lines at the new office really work?
Current odds around the office are 8:1 against it, and I'm taking as much of that action as I can get. And besides, if they aren't up, I might at least get to break the news to Sherard about his fickle co-workers...
New Releases Pending:
Effective with some successful merging of development branches which began today, we anticipate a 0.7.1 release as early as Friday of this week, which will be based on tomorrow night's CVS snapshot.
Pending successes there, we'll likely have a 0.6.2 stable release, including RPMs, available sometime early the following week.
These new releases will include some bug-fixes and more Web UI functionality. They'll be worth the download for the bug-fixes, but you'll stay for the functionality.
Watch for these later on. They'll also be announced on Freshmeat, as always.
Office Move - Still Pending:
Some minor construction still going on, as well as questionable data service. The furniture is supposed to be delivered on the 20th or so, and we've got some hardware that's supposed to show up not too long after that, so we'll probably be in later that week.
The offices are going to be a refreshing change of scenery for us. It gets old when the most interesting part of your work environment is the fact that you are between the high school and the mall. We can't count on power, and we sure can't count on phone service, but we can bank on the steady stream of high school kids showing us a) their baggy pants, b) their cool tattoos, c) their hair cuts (tres chic), or d) just how old we really are.
What will we miss about the current offices? Oh, there's plenty. For example, the family of ladybugs that lives in the window track (yes, we work in a building where the windows actually open!). And of course, the rust-stained ceiling tiles in the second floor of a third-story office building. How'd they get stained? Your guess is as good as mine. And I would be remiss not to mention old Mr. Chalky, the faint chalk outline on the floor of the prior office resident.
But enough of my rantings, don't we actually work for a living?
Coding Projects Underway:
* Snort Integration -- Initial design work is underway, with some pre-alpha functionality demo'd in Perl. Need to do some serious nuts-and-bolts analysis of this integration before proceeding. Still very early in this effort.
* Solaris Port Postgres Procedures -- Underway. No update.
* Postgres for NT -- As far as we know, this will work, but we still haven't heard back definitively from someone who has tested it. There are some additional hurdles to jump for the Win32 platform, now that we have a dependency on a portmap service for NT...
* Portmap for NT -- There is one that ships with NT/2000 that _should_ work, but we haven't tested it. There is another one referenced at http://www.plt.rwth-aachen.de/ks/english/oncrpc.html which is basically from the same project as the Java RPC libraries we are using. This is probably worth a look for those of you interested in running on NT.
* SNMP Poller/Data Collection -- The Web UI is alive, and we are talking about some tweaks to the default RRD formats. Thoughts on this? Let us know.
* Event DTD -- Changed yet again.
* User Interfaces -- Some bug fixes are in. Others pending. Larry's still adding features/functionality to the Web UI.
* SCM UI -- Replaced with "./opennms.sh scm status"
* LDAP Poller -- We're in the infancy of this one. If you want in, let me know.
* Maji Prelim Work -- Rick is building Perl code that is successfully parsing MIB files. Check him out, in all his glory, on the "events" list.
* Notification Configuration -- Actively being moved to the Web UI.
* Swing Interface -- Fighting random oddities. Proceed with caution.
* Discovery/CAPSD/Database Review -- Revisiting the way Discovery and capsd communicate, verifying that stuff is accurately written to the database, and adding some maintenance functionality we didn't have previously. Mike's the man!
Upcoming Road Shows
Hopefully, we'll have to add a regular section on "Seeing OpenNMS in Print"! If you aren't on Network World Fusion's email list on network and systems management, you missed their article on ten cool open source network management tools, which mentioned us. Kind of a goofy article overall, but hey, it's nice to see the name in print.
There's also a nasty rumor about this month's issue of Enterprise Linux magazine, but I'll believe it when I see it...
On with the road shows...
* May 5th - Twin Cities LUG, Minneapolis, MN
* May 10th - Boulder LUG, Boulder, CO
* June 1st - NOVALUG BBQ!! Fire-eaters Unite!!
* June 2nd - Northern Virginia LUG (NOVALUG), Alexandria, VA
* June 11-15 - OpenView Forum 2001, New Orleans, LA
* July 23-27 - O'Reilly Open Source Convention, San Diego, CA
For additional details on these appearances and others, check out the web site at http://www.opennms.org/sections/opennms/events
Early Adopter Program Status
Jeff has had some minor successes. At one site, Jeff was fighting with notification (which is in the product and works, thank you very much), and was having some problems with false outages. A few tweaked parameters that he hadn't tweaked before (maybe hadn't SEEN before) and suddenly, we fixed that, but also may have exposed a misconfiguration that could have been causing other problems. Gotta love the minor wins!
We've added another site to the EAP program, and are getting close to a saturation point. If you or your company may be interested in participating, go to the web and fill out the form. Luke and Jeff will be in touch.
The Wish List
Later this week, we'll begin working with our first potential contributor working under a government grant. That whole deal is not yet final, so we're not resting on those laurels yet, but hey, if Uncle Sam wants to give us some money, all we need is an ethernet jack in Cheney's pacemaker, and we'll do our best to help out where we can.
Otherwise, on with the list...
* In the 0.6.x release (and CVS), checkout the TODO file
* More Data Collection configs wanted for the DataCollection.xml
* Any interest in more TCP pollers? Let us know (or better yet, build one yourself...)
* LDAP Poller
* nmap Poller (That idea came in via email this week. Cool!)
* Documentation and development your game? How about a white paper on how to extend OpenNMS with custom pollers, custom configs, and/or your own scripts/code.
* Testing on new, exciting platforms is always appreciated. Somebody want to mess with the Cygwin port of our Postgres stored procedures and see where we stand?
* Any additional help we can get proving our documentation either right or wrong is appreciated. Thanks.
* Got any creative applications for OpenNMS that we haven't considered? Let us know!
* A Security analysis of OpenNMS?
Following the first section, I'm just about all ranted out. So I'll take this chance to catch up on the comments we've been receiving regarding name resolution and dependencies between an NMS and external DNS servers.
We've discussed several options, including running a cacheing DNS locally (which still leaves you with dependencies on an external DNS), creating our own local /etc/hosts file for exclusive name resolution, built from an nslookup or zone transfer script (kludgy at best), and then the solution we're pretty much settled on, which comes pretty close to Roger Zenker's description. Somebody buy that man a beer.
In brief, what we're currently thinking about doing is resolving the IP address to a name as we do the capabilities check on a node at discovery time, and writing that name to the database, then refreshing that name during the capabilities re-scan, which happens, by default, on a 24-hour interval (but is configurable). Of course, we'll also provide a utility to force that change if you need it to happen prior to the re-scan.
Your ideas were all helpful (except for the questions about how we, Snort, and Samba use different files for syslogging ?!?!), and all figured into the direction we're currently pursuing. Thanks again for the open discussion and stay tuned--there are plenty more questions to come, like this one:
What's the best algorithm for associating a name with a node? DNS only associates a name with an IP address, which is associated with an interface, and by definition, a node can have more than one interface. So which is the right name to associate with it? We're familiar with OpenView's algorithm, which seems reasonably good, but are the situations that aren't well-addressed by it (briefly, it's node name equals the first of whichever is available: SNMP sysName, hostname for software loopback interface, or hostname of interface with lowest-numbered IP address.) So, whaddaya think? Please take your responses to the [discuss] list.
And as always, thanks for your support. Not you, Sherard.
XXXs and OOOs to BellSouth,