APS10: CMS Success at Princeton University

[Intro Music]

Announcer: You’re listening to one in a series of presentations from the 2008 HighEdWeb Conference in Springfield, Missouri.

John Wagner: As was described, this is going to be a little weird. It’s kind of a little more technical than probably than you were expecting. I looked at the track description and it sort of said, “Well, let’s try to fit this.” This was given about two and a half weeks ago at Roxen Users Conference in Sweden. And unfortunately, our CMS is Roxen. And because it behaves a little differently than other folks, you will hear about Roxen a little bit throughout, it's not really mean as a sales presentation, but without the technical details, you can’t figure out what’s going on.

This was the blurb you saw that was put out on the website. I summarized it this way. Some of the reasons we think we were successful, the challenges we face now daily, running stuff, and the challenges that we look at for the future, or what did we do right, what do we do daily, what do we do for the future.

Let’s try to start with a short history of Princeton. I don’t know how much you know about Princeton. This is the original building. We moved in to it about the time of the revolution. There are cannonballs in this building from our side shooting at it because the British were inside. And they apparently left the building when we put a cannonball through the portrait of King George. We took out his head and that’s when they left the building.

This, however, is the campus today. That other building was the whole campus then. Does this remind you of the growth of your websites? So, I’d like to show you a little bit of what we looked like; about 20 pages, 1996. This is with what we called the core site which we always said was around 200 pages right before we went to Roxen. And this is what Roxen looked like.

Now, we are lucky to have the Halloween colors as our colors so it makes it more interesting for our graphic designers to come with a good idea. And this is us today. This change was done in less than two months for a site – oh, you’ll see the page comes later.

We are today running – we’re 11 instances. If you are used to Apache, this is the equivalent of 11 virtual hosts. We’re off the bottom maybe over here – department that have www in them. That’s because we used three instances to actually serve as www and we do that through the hardware we have in place.

This is a rough idea of our page counts. Now you should remember in 1996, we started with 20. We got about 200 at ‘94 when we did the conversion. This is where we are today. Roxen is based on XML. There are no pages as such in this system. All there are are XML files. So this is the count of XML files, to give you a rough idea. And we’ll come back to this later.

Now, how did we get where we are today? We started out pretty – we had a large RFP that went out. I said 70. I know it’s over a hundred but I was certain 70 is good. We got back everything. We cut it down to five for a bakeoff. We said to them, “Here are three or four pages from our current site. We want you to replicate it in your system. And then when you come in, you will teach us how to edit those pages.” And that’s how the decision was made about which CMS to use.

The large evaluation – we had 25 people in this group. And as far as I know, this is the first time in university history, maybe the last time, that we actually had a unanimous decision about anything. All 25 agreed on the system installation.

Then we really goofed up. We were finding process with it. Then we became a third party consultant in what we’re doing. We said, “Okay, you will come to us and only us, or if you want to go alone, yes, you will go alone but don’t talk to us. Those are your two choices, fair by far. If you want to go to an outside vendor or a third party vendor, you may do that. But there’s only we’ll let you go to.”

And the inside folks ignored maintenance issues entirely. If you can think about taking all of your campus-wide sites and putting them together in one bucket, and then writing one CSS file for all of those different looks, you have an idea of what our CSS file look like. It got really complicated.

Final result, you couldn’t develop it fast enough to make anybody happy. The codes alone couldn’t do it. The code was finally too complicated for the developers. I sat in two meetings of a group that was supposed to develop our fourth template. It took them three months to reach the point where they couldn’t develop it. It never got created. This was eight to ten people in a room once a week for three months. Total mess. And essentially, because of the approach, all sites were custom. So we got lucky.

I have to admit there is no intelligence here in this process. We simply got lucky. The people who were doing the work were part time physicians that run out of funding. And they had to move it somewhere else. And that somewhere else, the other group said, “We can’t do this. We can’t live with this. Here’s how we’re going to do it.”

So this was the corrective they put in place. We still say where the central development group, but they were chartered to do two things: Development and education. We took all of the custom stuff we’re been trying to do out and said let’s try be vanilla. That central group really started on doing user support. And they thought about it and said, “You know you really can’t do handholding all the time. You have to make users capable of holding their own hands.” And we allowed movable ways to finally develop the site.

I’m going to go through this in the backward order. We have a low cost way for you to do your site. You come in. You talk to the group. It’s essentially a cookie-cutter. There’s several different ways the site can look. You pick the cookie-cutter you like. We send you out the door and you get it for virtually zero.

If you want to do some custom stuff, you do get it built. This is pretty low cost based on most other options. You can also go to the same group or another group and essentially develop from scratch. They’re going to charge you time and materials. If you’ve ever done that, you know that’s not cheap but can produce a very nice site. Or you can go to external vendors. We have three now.

The last two wanted our business enough that they brought Roxen, which was headquartered in Sweden, in to train their employees before they would actually be allowed to do sites. So that was a little unusual among the vendors we have. The sites we’ve had from all of them have spanned folks from our alumni departments to in-house academic departments.

For user support, they recognized that the most important thing you can do for a user is train them to fish. So we have three levels of training. An introductory which is kind of from how to just edit a page, up to sort of the beginning edge of programming, intermediate which is how do you program in this system, and advance which is how do you program databases in the system.

We created an email list. The fellow who’s mainly in charge of this stuff now has worked really hard to establish a community because he sees the community as making possible to make his group to actually do the work. Community starts to support each other.

We documented standards. The original group that was doing this was in our communications department. They have documented the standards they want to see, and in essence have passed them off to IT, and now, instead of having going through reviews by them, any sites created by IT are basically just accepted because we say, “Okay, we need your checklist.” Any third party, one has to go through communication review. We have bimonthly meetings so they can bitch at us. Those are really good. They help a lot. They keep progress going.

The big thing we got out of the new approach was we didn’t do anything custom. Literally, the week before I went to Stockholm, we had to hire somebody from Roxen to undo a change that was in a custom module in one of the other sites, and what had been the original core site because we didn’t have the experience. But if it were just a really interesting secondary benefit which, at that time, it was a really ancillary benefit but now has become a core, where do you put your functionality? How do you build it?

In our case, since we start with XML, it’s all XML, XMLT processing. So, instead of putting custom code in in some language, we drop an XSLT file in. Process the tanks, produce the output you want, and there’s your page. So we have moved to actually putting functionality by direct restructure. Now, here’s a direct read that helps you get information from the LDAP directory that lists users in your department.

Here’s the directory that helps you get courses, various sorts of modules that come out as directories. And if you want to install it then user can say, “I really like their functionality. Okay, we drop it in.” And they have it. Easy to fix, it’s very common, and we no longer have to worry about knowing pipe which is the language this thing is written in. And the most ready part was now users actually watch the Roxen people and say, “What’s the new functionality they have? Oh, they’ve got – we want it. When can we have it?” So we get a push from the users to move forward.

Central development support group. This was the big change because we had done a central development group and it kind of failed. But there are some good things about a central group. It does give you a focus for where do the complaints come to? Who do you ask if you want to have something that’s not in the system? How do you go about getting support?

Not necessarily actually getting it done but who’s in charge of making sure a process happens and gets it done? So those are really big wins. The other win was in the beginning, we now have a community of people who can do this stuff and, in fact, can I put you on the spot? We have wonderful people sitting in. If you have questions, ask Daisy.

Net result. We had two sites up about by the time of – we got the software in ‘94. We had two sites up by about the beginning of ‘96. We switched what we’re doing. This is where we are today; 40 active departmental sites, 90 to go active probably in the next six months, and then 40 more in planning. So, by the end of the next year, we expect to be close to 200 departmental sites in a period that spans roughly two years of real work. And for us, that has been the real payoff for saying, “I think we’re doing something right finally.”

I’ve always described the university, and I use to use this all the time in the early 2000. For instance, it’s very much like a bunch of feudal kingdoms. Whenever we want to talk to somebody, you send out a knight with the lance down. And that was our communication pattern; to have all these people come to IT, who they dearly love, is sort of close to a miracle. The fact that they want to join and were willing to work together in a group.

So this is actually not a credit to my group that I’m in because I’m in a slightly different – but we’ll get to that later. This is a real credit to the folks in the web group that have been doing this and beating the bushes and bringing people in. Now we get to me.

I’m in a group that can best be described as an infrastructure group. Although we had the experience with the guys in the group of having done operating system stuff, we don’t do operating systems. We don’t do software. We don’t do content. That’s where this talk is funny because we’re in applications. What’s my application? My application as a content management system. The whole thing.

Our job is to make sure it stays up. That it’s running. That the folks who are doing the internal development have a system they can develop in. So we start to look at this growth and say, “Okay, what’s the impact of having growth?” We know that everybody wants it up all the time and we know that’s mythical. So how do you get there? What are you shooting for?

This is physically what our hardware setup looks like. We have a shared set of front ends, standard for your usual Apache, you know, it’s just a copy of your website drop out there. The editing servers where people make changes, in this case the one thing that’s different are plans to pull the changes from the backend.

And since the underlying structure is XML files, instead of sending a copy of the site, we send a copy of the change files. And the front ends reapply them as if they were edited when they get the update. To keep things up, the load balancer sit in the middle. There, two load balances there looking at each other. If one goes down, it fills over to the other.

What this allows us to do is I can turn around to my OS people, and this is the funny thing, if you wandering around and say to people, “What do you think 100% uptime means?” My OS people will – it means that we have at least one weekend and four hours every week in the morning between five and seven that we can take the systems down. That’s 100% uptime to them.

If I talk to my editors, 100% uptime is whenever I want a change in an article, it’s got to be there. So we have this sort of group that doesn’t go together.

If you talk to our DVAs, they’ll tell you the same thing about when to take things down. So we’re set up because load balances to allow a front end to be taken out. The other front end continues. So our OS people happy. It says to them, “You can do this during business hours. You don’t have to do it at five to seven in the morning.”So they’re really happy about that. It doesn’t really make our editors happy because we still have to do things about taking the systems down when we want to make OS changes.

If I want to change the Roxen software, we have to take the system down. But we figured out ways to cut that down to three minutes. So they’re not too unhappy about that. But there really is this question of what does ‘up’ mean. Is the hardware up and running? Our OS people think that’s up. Hardware’s up. Not a problem. Is the web server running? Well, from my viewpoint, since I’m trying to provide web services for these people, that’s a problem if it’s not up. Do you get a page back?

If you’re an editor and you’re editing a page and you go say publish, does it? And as you can see, if you remember the diagram, my editing boxes are separate from the front ends. So I might say, “Geez, the web is up,” and the editors say, “No, it’s not getting box down.”

What if you have one of those pages that has something out the backend that you go to. We’re now starting to put courses on departmental sites. We don’t drive the process of when the databases are up. We just use them. Well, the databases go down and those pages sometimes come up with the course section missing. Is the system up? You could argue either way almost there.

What if you put in the extra code that says where the courses should have been? Sorry, we’re doing system maintenance. Does the user think that’s up? Hardly ever. Although a lot of our people inside think that means it’s up. So what I use and rely on is response time monitors.

This is the response time of www.princeton.edu. As you can see, we have bad days and good days, good hours. This is in millisecond so this is almost a five-second response time. The rest of it is way under one.
We look at this to say our real goal here is that the whole day looks like that little bit of stuff down at the bottom with none of the spikes. That’s what the target for our group is to say “it’s up” along with the fact that we’re being told the pages are coming back.

But that takes us back to this. Where did all these pages come from? I’m saying there, basically, a couple of years’ span time from almost nothing to this. Well, there’s an awful lot of people time in here. People sitting there, editing, doing all sorts of stuff; some of these are archives that they’ve asked us to pick up somewhere and get in to the system where they have gone off and done third party request for somebody who builds something great in the system. So there’s a lot of work here with all these pages. That means that I must have somebody who’s doing updates somewhere. And this is what our daily update cycle looks like.

As you can see, we have some fairly hot days; 1600 updates in one day. This is across 13 sites. But still, that’s quite a few updates. And you can also see these other wonderful trends. And I haven’t done the numbers yet but I’m afraid it goes like this. Just continuing straight up. This is the same number spread out a little different. This is based on which one of our servers are being updated.

This is our departmental instance, the one that I gave you the numbers on of how many we have. We’ve been adding editors and editors and editors and editors because usually every department has at least two editors. And they don’t necessarily work together except for making sure they don’t put the same article in together.

So there’s a lot of processing that’s going on here, and the processing rate depends on what your complaint rate will look like. Now, this was a fun one and you need to know that I did something here tricky. I did this graph without including Saturday or Sunday. So this is just the business.

We have some people who need to work really early. And then we get to about 10:00. Everybody’s in. They get their stuff done. They go to lunch, come back, and then we have the ‘Oops, I forgot something’. That’s the average. The peak ones, I mean, you can see the peak matches the shape of the average. It’s not a price because we know how people work. But when you’re trying to make sure the system is up, these are the things that cost your editors to say, “No, it’s not. I can’t make my changes in the time I need to. It’s taking too long.” That’s doubt.

I need this title because it fits the university so well. If you’re a faculty, you must publish. And if you’re not faculty, it’s because they put that load of stuff on your desk that they told you to get out.

High publishing rates means somewhere I got to be spending a lot of time with hardware using horsepower. And what I want to do is have my editing community or my web community happy about it. The question is where does that time go? But that load does drive your complaint rate. The right hand was actually the second front end server. The top one is, that's live, that’s our servers for now.

This would be 50% of a four processor machine meaning essentially two processor are full. Our front ends are running so we can take them and put them both on one box. It’s got a lot of breathing room there. This is almost never our problem.

We have two editing servers. This is the older one. Because of the software Roxen’s build on, you can’t run on more than one processor at a time. It’s a little like if you’re used to Apache, Apache forks a bunch of children, and each of the children runs on only one processor. But it has 50 children out there so they can spread across all. Roxen isn’t designed that way. It doesn’t fork children. If you want to have multiple child processor then you have to start nodes. We don’t. We run one editing process because of the way the software works.

So these things that look like plateaus, it’s an instance that has run out of processor. It’s using one processor 100% for that whole time with plateau. And that’s when the phone starts to ring because the editors say, “Geez, it’s not working.” Now, why do we get that? Because of the way Roxen implements publish. Roxen, when I say publish, it gives me a sandbox and I make all my changes in it. And I can look at the site’s going to look like when the changes get applied. And when I say publish, it goes to and figures out all the files are going to get changed, puts in a little file that is – the difference is the changes and drops it.

Well, this department site was the biggest site. It was the 20,000-page count one. So it has to look through all of those because sometimes you make a change on one page that actually invalidates a lot of stuff across all the pages like your navigation. And when it does that, it’s really horsepower-intensive. So we get those piece for publishing. And that slows everybody down.

Because of complaints, we actually moved to a separate server. The only difference between these two boxes is the second box’s processors there are three times as fast. But I still plateau at 10:00 this morning. That’s where the departmental service is. This box, we actually moved it and it still runs out.

So from the viewpoint of is it up, I’ve got a lot of users in the departmental instance who are saying, “Well, maybe.” So the question is how do you fix it in one pot because that’s our problem? Well, one of the pieces technically under the covers that was done by the web group that’s doing the inside stuff, the content stuff, where they created these pieces of code that are shared among all the departmental units.

And I keep saying to them, “Well, to fix this we have to create more instances so we can run on more processors.” And they keep saying, “But we don’t want to do that because that means maintenance. Then we have to make our changes in every one of them. And every time you split, it means another one we have to maintain.”

So we’re in negotiations, might be the best description. And the real problem we’re trying to do here is how do you do this without losing your mind to manage it? That’s the piece we’re trying to solve.

The other piece that we see in our group that has to do with the ‘is it up’ is we have four guys. We’re each on call one time out of every four weeks. The on call guys’ responsibility is handling all the problems for the group, wherever they are. That allows the rest of us to figure out how to plan projects in three weeks at open time.

But the result of that is if they call and say something’s not running, it’s not usually, “Well, why isn’t it running?” It’s “I don’t care. Fix it right now.” And you have this really bad part about software that occasionally software just stops. It may be really esoteric. You could probably find it a few days. But for my operational viewpoint, if I restart it, it goes. We all have those windows boxes on our desktops. It’s back, it’s good, it’s working. Well, we kind of use the same technique here.

Failures always occur at the time you would least like it. Big failures occurs somewhere between 2:00 to 4:00 a.m. in the morning till the phone can wake you up but you start fiddling with it. So the question is how do you avoid the phone call and how do you avoid the time it takes to process the phone call so you look like you’re down less?

Well, we’ve created some standardized interfaces web tools that allow us to kind of drive the process automatically. We have stuff that monitors those pages; go out, get the page, check for a string at the bottom of it. Did you get the string? Okay, we must have gotten back the real page. No, we didn’t get it. Something’s wrong. Send the page out; physical phone page, text message to whoever’s on call and say, “You should go look.”

We have a set of automatic nannies that handle the case of software is literally not running. You look at the system and it looks like it was never started. Well, why not? Frankly, we don’t care. We just want again to start it. The nanny does that.

But the other case, is the software up and running but were not responding? That has to be handled by human. We have not figured out how to make that automated yet. Well, maybe someday. Our concern is the day that somebody decides to publish and what they publish impacts the thing we’re looking for with our monitors to say it’s up or down on the page. And so then if we’re automated and said, “Oh, geez, it doesn’t look like it’s coming back,” and automatically restart it, we would just have this automatic restart and killing the things all the time. So we send the person.

What these monitors look like – this is what we call our IP rover. It scans every box in our system within one minute. It does really basic checks.

Does the system respond as if it’s been powered on? That’s the basic starting point. It will check to see if an HTTP response comes back. It doesn’t care what it is. It just says, “Okay, I connected you. It said it was there. Fine.” So these are really basic. But this one tells us when the system is down. If the hardware is down, it place out a call, not to us, but to the people who maintain the hardware.

This is our Tivoli Monitoring System. This one does things like go out and hit a page and say, “Oh, geez, I didn’t get the right response after three tries which is roughly three minutes. Let’s call a human to come in.” And it’s really nice because you have devices like this, iPhones, or iPod Touches with wifi active, you don’t have to be able to read that. You just look for the color red.

And you can sit there in a meeting and just watch it, and it tells you something’s down. It actually is very interesting to see the response of some customers when you’re sitting in the same meeting with them, knowing something’s going down, and bringing it back up sort of suddenly with through your phone and it comes back in there like, “Oh, it came back. Okay.”

And when we need to get people involve, we have a web interface. This is the nanny system. You can tell which are the things are not being nannied. I have a friend at work who sees only shades of gray. And I took this design in to him, originally, when I had green and red on the page.

So this stuff that was white was green to say it was on. And he took one look at it and he said, “Couldn’t you just tell me what’s off? It’s really hard to read.” And it was one of those things I would never see. I see the colors. He doesn’t. So we went with just this design and just the red for usability.

Things that are there can run on this box. Things that don’t have the nanny turned on aren’t required to run on this box at the moment. But the other side of this interface is a current process. We’ll see how fast it comes back. Again, this is my real live system we’re looking at. And it’s not a test box. This is what we’re running with right now.

You get the phone call. It’s three in the morning. You go over to your laptop, you log in, you go to the page, like this guys; oops, it’s not running. We’re click on start, done, go back to bed. It starts it.

If it’s running but not responding, you’ll actually see entrees that look like the others, you know, with the stop button in front of it, so then you try the stop button. If that works, you can go back to bed again because you can just hit start. If it won’t stop then you actually have to do real work and you get up.

But that’s what we put in place, not so much because it makes our job easier, as much as it makes the system behave for the customers who’d come in and expect it to be up, which leads on to the next and perhaps the most interesting part.

Now, as I’ve been moving along here, I’ve gone from a lot of facts to, sort of, a little facts. We’re now going to move in to a fair amount of speculation because what I want to look at is the shooter on campus scenario. We’ve been using this as our way for planning not for the disaster, recovery disaster; the one where you have to go to all your back up tapes that you’ve been sending off campus.

The strangeness over the shooter on campus or some other similarly big kind of disaster on campus is all your systems are up and running. Everything is working as normal. It’s just you have this current thing going on that’s really not normal.

So I keep going. I mean, last year, if you were here last year and you heard the description in lunch when he talked about the days going through this. I mean, I listened to that and just so much respect for that Virginia Tech Group. What they did really set a standard.

Now, I know that they had one smaller practice around them a couple of years earlier. But when you’re in the midst of that and you really make it work the way they did, you’ve done something really great. And that’s our target. At least, within our group, that’s the target we’re shooting for is that the end, somebody will look at what we’ve done and say the same thing.

So, what’s the goals? Well, number one, use more information, right? We’re in the midst of something that requires information for people and you want what they get to be useful. Using Virginia Tech’s numbers, you have to handle a whole months’ worth of traffic in one day. That was their peak day. And every hour, they handled ten times as much traffic as an army do on average. Except that they did it for just 24 hours in a row instead of the usual ups and downs cycles.

So we’ve started, just as many other schools have, to sit down and talk about this. How do you get there? How do you really do it? We had a table up about two months ago in which we brought together all the people in the university who theoretically would be involved in this responses and said, “Okay, let’s talk our way through it.” At the end of four hours, they had covered the first ten minutes. And I think that’s probably a pretty good evaluation of how complicated it is.

And at the end of the ten minutes, I heard that somebody in a room said, “And then they put it up on the web,” because they hadn’t gone that far yet. So basic assumptions, and what I’ve said before is it’s not even assumption. The web is not your first line of defense in this kind of event. It’s too slow. It doesn’t reach enough people. They’re not where you’re sending the communication.

So you end up with two groups. You end up with probably campus security, and maybe your communications group. And if those two aren’t working together, it’s not going to work. But if they are, then you have that interesting balance of campus security saying, “Okay, we have enough information that you, communications, could release this.” And that’s finally where we get to the point where the web starts to make sense.

Our job is to get that response up. So what we’ve done is try to structure what’s the process. How do you get there because who knows what time of day it is? Who knows where people are? Maybe it’s a holiday, maybe staff’s off campus, basically, for some reason. So what do you do? Well, you have to have a way to switch. You’ve got to switch to your emergency page.

In our case, because we have the XSLT processing of the pages, what they did was they have two XSLT files. The one is the normal Princeton look and feel, and the other one does basically the same processing until it gets to the last line, or shortly before the last line and it says, “Take everything we just did and throw it away, and now include that file over there which is the emergency page and processing.” And so that’s how they throw the switch. They rename the file. So anybody who knows how to use the editing system can go in and rename the file and throw the switch.

Now, we don’t let that be anybody because there’s access control. But effectively, anybody who’s allowed to control the switch. Then, they publish. It goes through our front end service. We immediately have two servers up and running that can handle our daily load. Our daily load, frankly, is so low that those two servers could handle a good portion of that increase that we need.

But then on one side that they didn’t call us and we start to bring up other servers. This one is sitting there all the time and we normally have an update cycle of one minute. Go out and see if there’s updates every minute and pull them out. That one runs with an update cycle of once an hour. So it’s close to what’s on the real site.

So if they make changes in the emergency page, we’ve got the changes out there before the incident. So our guys hop on, get that thing in synch by changing its cycle time to one minute, wait till it’s actually in synch, and put it on. Those things are running on ports 81 and 444, and the normal ports are 80 and 443, and our load balancer has them in it, but if things are down because they’re not on the right ports. So you change the ports that comes on.

These guys, we actually bring two when we’re part of the three. Right now the last one is actually for a testing. Those are way out of date. They haven’t been brought up for two months and they’re going to take a while to actually get back in synch. But as soon as we bring up the first one, we now have four processors essentially going through the same load.

As soon as they bring those up then they’re in synch, say, ten minutes later, I’ve got eight. I’ve already pushed four of these in load test to the Virginia Tech numbers. So that I know I can do a months’ worth of processing on four of them. But I’ve got eight so I’ve got a safety balancing in case it goes away.

But as I said, how do you know it’s going to happen? You don’t. So where are you? The guy who’s on call, the phone call that says, “Hey, we need the emergency processors up,” and then he’s groggy in bed, how do you do that from home? Well, the way you do it is you plan for other ways into you network.

You plan to be able to get there from anywhere, like, we’re talking about moving a server outside of Princeton in case the wires in the Princeton go away. We’ll teach the editors how to get there to change it. It’s going to be the same software so they’ll be familiar with it.

The interesting part again is we can get on these boxes with a web browser. You got a phone, you’re driving in a car, you pull over at the next rest stop, and you can help the folks back on the campus to get up what needs to be cracking up. Big one though is you got to test the boxes. We haven’t gotten there yet. We haven’t done a real test.

And then you have to document it because the other question is who’s actually going to be there? You don’t know. Well, you better leave enough documentation behind and pointers to it so that if any of the folks who know what’s going on don’t have to be there.

Now, you want to tell the world the story, right? I mean, the whole reason we’re doing this is because what happens in Virginia Tech or many other schools, for first time I realized the power of the web. Somebody got up at a presentation and said, “We had something that hit the news and our PBX was filled up by parents calling. And we put the story up on the homepage and the PBX emptied.”

And I said, “Oh, that’s what we’re doing here. We’re making a place that people can go and they go there more often, where they trust what is there as being true just as much as you’re talking to them.” And so that’s our goal. How do we get it up there and make sure those people – this gets back again and saying, “Well, what if we have to put a server outside of Princeton, some case they can’t get to Princeton, there’s a server with the correct information on it somewhere.”

We’re talking things like Akamai or other various servers; the other one was Sunguard maybe. But that does lead to a bunch of questions. Can you do updates under heavy loads in the front ends? I don’t know. We haven’t tried that yet. It’s got to be tested. Will the front ends handle the load? I think so because my testing said so. But we need to test it again while we’re doing the updates to make sure.

And that really leads to the last rule which is if you’re going to plan four, this kind of situation, the only thing you can do is say test it, evaluate, test it, evaluate, test it, evaluate, and then next month test it again because who knows what’s happened on the other systems?

So that’s what we’re doing today. We’re in the midst of that discussion of how do we make this thing run close to 100% in the case of disaster? And after that, we think that in fact we have solved a lot of other problems because it’s so much easier to do this in a day-to-day basis than it would be under that pressure. So you have questions? Thank you.

[Applause]

Announcer: For more presentations from the 2008 HighEdWeb Conference visit HighEdWeb.org/2008 or sign up for our podcast and feed at HighEdWeb.org/podcast.xml

[End of Music]

APS10: CMS Success at Princeton University

John Wagner, Web Guy, Princeton University