Network Outage on 9/2/2008

Here is what we know so far about the network outage on Tuesday, 9/2/2008:

Around 8:30 AM, ITS staff began receiving email notifications indicating that there was unusually high activity on electronic portfolio that was stressing our databases. One of the databases was restarted at around 8:40 AM to handle this issue, everything returned back to normal shortly after that time.

Then, at 11:00 AM, many in ITS noticed their inability to access the internet. When the networking staff began looking for possible signs of problems, none of the usual devices showed any signs of problems. Over the next few minutes, various ITS staff members collaborated in trying to locate this elusive issue, and concluded that one portion of the network was unable to access the Domain Name Servers (these are the servers that translate a name such as google.com or www.wesleyan.edu to an IP address). Our attempts to contact Cisco, a network provider, were not successful. Since then we have had many conversations with them and they are analyzing the data collected during this period.  At that point, we began restarting various network devices to isolate the issue. Finally, the restart of one of the devices called the Core Switch, cleared up the problem at about 1 PM.

Unfortunately, due to the interdependencies between systems and network devices, the e-mail system serving both students and faculty malfunctioned when one of the network devices was restarted; this issue was resolved approximately around 3 PM.

We also received wireless access issues around 5:00 PM and the problems were resolved shortly thereafter.

We are continuing to investigate these problems with the help of Cisco and are consulting with colleagues in other institutions to see if they have experienced something like this. We will update this blog when more information becomes available. This investigation may require us to test certain things which may result in brief network interruptions, but we will plan to do these during off hours and weekends.

We understand that these service interruptions are very disruptive to all of us; we in ITS work hard to ensure that these disruptions do not happen.   We do apologize for the inconvenience that this caused for all of our users.