Jeremy Grosser synack@csh.rit.edu PROJECT TITLE Distributed monitoring of Jabber networks SYNOPSIS As the Jabber network continues to grow in size, functionality, and complexity, it continues to have problems with reliability. It is often the case that an outage on a smaller server may go unnoticed for a matter of hours simply because proper tools to monitor the stability and usability of the Jabber network are not available. The goal of this project is to create a set of automated tools to continually monitor the network's usability. I believe that given a set of reliable tools for diagnosing problems in the Jabber network, administrators will be much better equipped to solve those problems in a timely manner. BENEFITS TO THE OPEN SOURCE COMMUNITY The following benefits are expected to be acheived with the successful completion of this project. 1. Reduced downtime on community communications infrastructure. 2. Faster awareness of application level problems. 3. More detailed problem descriptions (eg. can tell where the system is failing, not just "It doesn't work") 4. Distributed testing from multiple hosts 5. A modular monitoring system extensible to other applications and protocols. DELIVERABLES The following deliverables can be expected upon successful completion of this project: 1. Communication Tester - Establishes a connection to a Jabber server, authenticates using SASL, TLS, and SSL if available, and verify successful communication with another testing host. Should be able to be run automatically or standalone for manual diagnostics. 2. Administrator Notifier - Notifies the proper administrator(s) that there is a problem communicating and include detailed diagnostic output from the communication tester. Notification should support at least the following methods: Email, SMS, and Jabber (if a server is available). 3. Control Daemon - Schedules periodic tests of functionality between servers, aggregates test result data, and coordinates with other testing hosts. 4. Report Generator - Creates a Dot format graph to be rendered with graphviz and posted to the XMPP Federation (www.xmpp.net) on a publicly accessible network overview page. In addition to the above deliverables, the assigned mentor will be able to keep track of this project's progress via regular blog posts as well as phone, email, and of course Jabber correspondence with the author. PROJECT DETAILS 1. All implementation will be done using the Python language unless it becomes desired or neccessary to use another means of implementation. The exception to this rule being the report generation and website updating. A shell script may be better suited to this task. 2. The xmpppy library will be used whenever possible to avoid rewriting existing code. Modifications and/or additions may be made to this library to accomodate special circumstances. 3. Notification methods, test cases, and authentication methods should all be site configurable based on a simple configuration file. PROJECT SCHEDULE 1. Research (1 week): Study the xmpppy library and the Jabber RFCs in order to get a better idea of what kind of testing functionality is readily available for use in the Communication Tester module. 2. Communication Tester (2 - 3 weeks): Design and implementation of the Communication Tester module. By the end of week 4, a standalone program that can establish a connection, authenticate, and send a test message while outputting any errors or possible protocol violations encountered during the process. 3. Control Daemon (3 - 4 weeks): Design and implementation of the Control Daemon. This should be broken down into the following parts. a. Scheduler - Executes the Communication Tester at intervals specified in the site configuration. b. Aggregation - Maintain a database containing the last known state of all tested servers. c. Distribution - Perform tests on the behalf of other testing hosts and relay the results back to them. 4. Administrator Notifier (1 - 2 weeks): Design and implementation of the Administrator Notifier. Will be called by the Control Daemon if a problem arises. Notification methods should be provided in the configuration file. 5. Report Generator (2 - 3 weeks): Design and implementation of the Report Generator. Should generate a Dot format graph based on the aggregation data provided by the Control Daemon. The Dot graph will then be rendered by graphviz and posted on the XMPP Federation website. 6. Overall Improvements (remaining weeks): Revisit the design and implementation choices made in each module and make modifications as neccesary or desired. Heavy usage testing should be performed with real deployments and bugs should be dealt with. Additional features may be considered at this time. BIOGRAPHY I am currently pursuing a BS in Applied Networking and Systems Administration at Rochester Institute of Technology. Throughout middle school and high school I taught myself (using Google) web development, Unix, and programming skills. I spent the summer of 2005 pursuing an interest in distributed computing at Corning Community College in Corning, NY by helping to build a small distributed systems lab. I have not contributed to any major open source project before but have been an advocate of open source activity for a number of years. AMOUNT REQUESTED I am submitting this proposal as part of the Google "Summer of Code" project and as such am requesting the full stipend given as part of that project.