I’m not a project manager, but…, part 2.

As I said before, I’m not a project manager in real life, but I like to say I can play one on TV. I know enough to get by and know enough to come up with plans, timelines, resources, etc, to give my projects life and get them to completion. It’s not a skill all sys admins have as it requires thinking outside of the technology box and figuring out how your projects and deployments can affect other groups and other people. I like puzzles and can usually think my way around problems, and I think it’s helped me with regards to this kind of thing.

The newest fad these days is Agile, with its terms like Daily Scrum, Sprints, Scrum Master, Demos, Reviews, User Stories, etc. If you happen to be on the job boards there’s a good chance that when you look at PM jobs you’ll see a ton that mention Agile and what that kind of experience. I know, because my wife is a real Project Manager :).

Agile doesn’t have a PM, though. Their term for it is Scrum Master, I guess because you manage all the “scrum” and have daily calls, burndown charts, stories, tasks, and etc.

You can get an official definition of Scrum off of Wiki, here: http://en.wikipedia.org/wiki/Agile_software_development. They have a Manifest and everything:

Agile software development is a group of software development methods based on iterative and incremental development, where requirements and solutions evolve through collaboration between self-organizing, cross-functional teams. It promotes adaptive planning, evolutionary development and delivery, a time-boxed iterative approach, and encourages rapid and flexible response to change. It is a conceptual framework that promotes foreseen interactions throughout the development cycle. The Agile Manifesto introduced the term in 2001.

The whole point of Agile and why management loves the idea of it is that you can releases out the door quicker. Let’s say you have a team of Java Developers (who all know how to develop Java, of course!) and you know you’ve got a release coming up in 3 months. With Agile you would come up with the high level deliverables and try to break those up into chunks called User Stories to make the work smaller and more manageable so that your developers can have something easy to develop against. Then you have to decide how quickly you want this work done. From my research the typical “sprints” are 2-4 weeks (a sprint is your turn around time to get a typical story done) although I’ve worked a place that did 1 week sprints, which don’t work very well in my opinion. You see, part of the problem with Agile is that it’s very top-heavy with meetings. You have to have daily meetings where you go around and every person has to say what they did yesterday, what they plan on getting done today, and any issues they’re having. Then you have to have a meeting that can last half a day where you plan all the stories you want to get done in your sprint. Then at the end of the sprint you have another half day meeting where everyone talks about or demos what they got done in the sprint. The shorter your sprints the more percentage amount of time you’re spending in meetings.

So a “User Story” or just “Story” is the term used to describe what you want to get out of your sprint or tasks. Let’s take our example of the release you want to get done in 3 months. Say your goal is to have your web site up and running. Your story might be in the format of:

“As a web developer, I want to have a finished website so that I can browse to it and give it to customers for them to order products”.

Note the 3 different parts of that Story: “As a…”. In your Stories, you need to do them from different perspectives so that everyone knows which direction you’re coming from. Another tack for this one might be “As a customer of XYZ Company, I want a website so that I can order products”.

The 2nd and 3rd parts are fairly obviously, it’s what you want and why you want them.

This story might be called your “Epic” Story, which simply means that it’s the very high level of what you want. You really don’t get requirements or needs or what it takes to get that done from that story, but then you can create sub-Stories such as “As a developer I need to get the requirements for product ordering on the web from a customer” or “As a Marketing Associate, I need a web site that advertises our products so that we can sell more”. As you can see, for this 3 month release cycle you could generate hundreds of stories. Most of which you won’t know on day 1, but as you go on and refine your products and requirements they can generate other stories that get to the nitty-gritty such as: “As a graphics designer I need to build the logo for the web site so that…”

The idea behind Agile is to do your best to break the work down into workable chunks so that everyone on your team can work on any task and throw their results in the pot. High-level stories are usually generated by the Product Owner (another Agile term, simply means the person who is requesting the product and probably providing all the requirements). Lower level stories that get done in the sprint are usually tasked out by the team in planning sessions. Then the results are demo-ed in your Retro session. The idea here is that if you have requirements to build a screen or web page, you can take it back to the Owner and team (even if it’s only GUI and doesn’t actually do anything yet) and see if they like it. If they don’t, you take it back, create a new story for it and re-work it. The example we had in training was that say the Owner said they wanted the color to be yellow, then after they saw it they wanted it purple. So next week you make it purple. Then they see purple and want it red… this can go on for as long as you let it. One of the major tenets of Agile (which I don’t agree with it at all), is that it’s cheaper to do re-work (and re-work, and re-work) than it is to just do it right the first time after planning things out using the Waterfall method.

Well that’s high-level Agile. I attempted to give it un-bias. Now for the bias 🙂

Let’s re-read that blurb from Wiki:

Agile software development is a group of software development methods based on iterative and incremental development, where requirements and solutions evolve through collaboration between self-organizing, cross-functional teams. It promotes adaptive planning, evolutionary development and delivery, a time-boxed iterative approach, and encourages rapid and flexible response to change. It is a conceptual framework that promotes foreseen interactions throughout the development cycle. The Agile Manifesto introduced the term in 2001.

Do you see the pieces I’ve bolded there? “Software development”. The first thing I tell people is that I’m NOT a Software Developer, and I definitely don'[t play one on TV. I’m a Systems Admin, Engineer, & Architect. Pick your flavor and that’s me. I might be asked to come in to a company and “migrate us from Exchange 2007 to Exchange 2010, on a new Active Directory Forest”. That’s a long term project requiring buying tons of software or writing scripts and coming up with plans for how to architect your new AD, your new Exchange DAG’s, how to migrate PC’s over, regular domain members, etc. Give me a day and I can have a 10 page document for how it affects your environment to do this project. We’d also need to discuss HA and DR options, decide how much of that we want to throw in the mix, etc.

Your ask (or your Epic User Story) might be an incredibly easy statement, but it generates tons of work depending on the size of your environment and might end up being a multi-year project.

In a case like this, do you imagine I’m going to use Waterfall, or am I going to use Agile? Waterfall, hands-down. There’s no room for error in a project like this. You can’t simply do a lot of work and push it out and then go, “but I wanted it green”. This is a project that needs to be very precise and leaves little room for error. It also needs to be planned out to the nines. If you don’t account for the fact that at some point you’ll need to ADMT your users over and hopefully you moved your SIDHistory and hopefully they can still access their resources on FileServerA… the first user you migrate will be a huge and colossal failure. Oh, and hopefully you didn’t forget about your Public Folders and Shared Mailboxes. How did you account for those? And what if your team only has 1 Exchange guy? You’re not really at a place where you can task a story out and let everyone on your team do the work. Plus, you’ve got a team of Windows Engineers. Maybe 2 Exchange guys, 2 SQL guys, 2 web guys. They all have very specialized skills and what admin wants to be a generalist who knows just a little bit about everything? That’s the guy who ends up not moving up the ladder and just isn’t specialized enough.

This post is already way longer than I wanted it to be, but I’ll just end it by saying that I think Agile has it’s place. For software developers. You have a team of Java developers (maybe with varying levels of skill, but you get the point) and they can all mostly do each other’s jobs. You have a team of Engineers and you probably have 1 guy who’s doing all the work on the project.

Systems Center 2012

So I’m signed up for the 2 beta exams next week. Having fun studying 🙂

Basically running through this very informative guide here: http://technet.microsoft.com/en-us/evalcenter/hh505660

You have to log in and then sign up. Once in there you can download all the main software and the “Microsoft Private Cloud Evaluation Guide”. It gives a very straightforward guide on how to prep and install all the Systems Center 2012 components and configure them to talk to each other. I’m not all the way through it yet, but so far some pretty heady stuff. You have to have around 7 or 8 servers, depending on what all you want to install. I don’t have a server with Hyper-V on it since I’m working in a VM cloud environment, so hopefully that doesn’t kill me too much down the line.

There is a ton of prep work involved with downloading and extracting all the software. You also have to download and prep all the prerequisite software. It won’t let you go any further in the process unless you have it.

You also can’t cheap out like I did and try to install multiple components on the same server. So be prepared and have all your servers up and running with base OS (2008 R2). There’s also a step in there to setup GPO’s to enable some WinRM settings. I again tried to cheap out and hurry up and not do that part and it came back and bit me on the butt. So in other words: Follow the guide!

Here’s where I’m at now. There’s more coming!

SQL 2012 Launch

First off, can I say how stoked I was with some of the new features of SQL 2012? I don’t usually geek out this much over software stuff, but I have to say there were several WOW factors on the launch yesterday.

The bad to outweigh the good is that the Launch Event itself was completely terrible. I and a thousand others on twitter spend a good 30 minutes just waiting for the login screen to come up. When it finally did, it still wouldn’t let you login. Finally, about an hour after the scheduled start time, everything was up and running. Epic fail on Microsoft’s part.

The other bad was that there was nothing live. It was all pre-recorded videos on various SQL topics. They were good videos, but nothing is better than a real launch event where you see it all on the big screen and get a big grab-bag full of goodies. I attended the Windows 7 launch a few years back at the Denver Convention Center and it was pretty awesome.

So here’s a link to the launch event. The videos themselves are supposed to be available for 90 days and it’s free to register, so give them a whirl:

http://www.sqlserverlaunch.com/WW/Login

What new features did I like?

Resource Governor for multi-tenancy

  • Have multiple customer databases on your SQL server and want to meter them? The resource pools allow you to set minimum and maximum CPU and memory limits. You can allow bursts and if you bill customers via CPU then with the maximum so they don’t get charged extra.
  • Also, much better billing options. Your stats on the server now match exactly what your billing your customers and it really is what they’ve used. How many times do you go to pull CPU/memory stats and have them be completely indecipherable with actual customer usage?

AlwaysOn Availability Groups

  • I’m a Windows Clustering guy, but I’m also a fan of SQL Replication/Mirroring and the options that gives you. AOA Groups take this to a whole new level.
  • The demo I saw still had the base Windows servers in a cluster (a couple local, 1 as a standby, then another at a different site), and with the AlwaysOn you can specify for each database which server was primary for it, which was an automatic failover server, which was a manual failover server, and if you wanted the data to be copied synchronously or asynchronously. I immediately saw HUGE benefits for this for a job at a prior company where we had geo-clusters spread across different locations.
  • The big caveat with AOA Groups is that your end-user application still has to be aware of the nodes and it’s DSN has to be cognizant of all the options.
  • One other option they mentioned but didn’t go into was that you can add a Listener to the configuration and it’s basically the same as the old Virtual IP/Network Name and the end-user application doesn’t necessarily need to be aware of everything that’s going on.
  • You can also specify any of your secondaries as a read-only copy of the data, and can use it as a backup point or for reports or anything like that. This is a brand-new feature as previously you couldn’t touch the copies of Mirrored databases.
  • You also have what they call a “flexible failover policy” where you can specify what parameters can initiate an automatic failover. Again, I see huge benefits for builds I’ve done at prior clients where sometimes the network would hiccup and the DB would failover and no one knew until it was slow.

That’s really only 2 major changes to SQL 2012, but it’s enough to make me download it and start playing!

SQL Mirroring with SQL 2008, part 2

So in my previous post I went through the reasons why you might want to use SQL mirroring and how you could possibly sell the added expense to your management. In this post I’m going to cover the actual SQL commands for doing so. There is a wizard you can run through to actually set this up (right-click your database, go to Properties, click on Mirroring on the left side, run the “Configure Security” wizard, but what’s the fun in that? Plus I’ve had issues with the wizard not quite setting things up right before.

Pre-req’s: for the purpose of this article let’s assume that you have 3 servers (physical or VM, doesn’t matter), with Windows 2008 R2 installed, joined to your AD domain. You also have SQL 2008R2 installed, latest service pack. All instances of SQL are running the default instance and running under the same Windows AD service account.

Let’s assume your servers are called SQLRepl01, SQLRepl02, and SQLRepl03. We’ll use SQLRepl01 as the Principal instance, SQLRepl02 as the Mirror, and SQLRepl03 as the Witness. You can actually use any of the 3 for any of these roles and for your next database you could change it completely.

For my test I created a database called ReplicationTest and populated it with data using this wonderful article: http://www.mitchelsellers.com/blogs/articletype/articleview/articleid/249/creating-random-sql-server-test-data.aspx

Okay, so now that all that’s done, let’s actually get started.

    1. First we need to create an EndPoint for each SQL server so that they can listen to the traffic for the Mirroring. This needs to be run on all 3 SQL instances. 5022 is the default port used for this, but you can specify any unused port you want, just make sure that the servers can talk to each other on this port (i.e. opened in the Windows Firewall or any other firewalls that exist):
CREATE ENDPOINT Mirroring
    STATE=STARTED
    AS TCP (LISTENER_PORT=5022)
    FOR DATABASE_MIRRORING (ROLE=ALL)
GO
    1. Okay, so now that your endpoints are all created we need to do a backup of the existing database and then restore it on your mirror instance. This creates, in essence, a “seed” database. Run the following backup commands on SQLRepl01
USE MASTER
GO
BACKUP DATABASE ReplicationTest TO DISK='C:TEST.BAK'
GO
BACKUP LOG ReplicationTest TO DISK='C:TEST.LOG'
GO
    1. Browse to C: on SQLRepl01 and copy test.bak and test.log to \SQLRepl02c$. Then login to SQLRepl02 and run the following commands. Note that you have to restore with NORECOVERY to leave the database in a state that allows us to replicate. If you don’t then you’ll need to restore the database again. This ensure that the 2 databases are a mirror of each other.
RESTORE DATABASE ReplicationTest FROM DISK='C:TEST.BAK' WITH NORECOVERY
GO
RESTORE LOG ReplicationTest FROM DISK='C:TEST.LOG' WITH NORECOVERY
GO
    1. This is what the restored database will look like:

    1. Okay, now that the databases are there we can actually set up the mirror. IMPORTANT: run this command on the mirror instance (i.e. SQLRepl02). If you changed the port away from the default of 5022, change it in the code below.
ALTER DATABASE ReplicationTest
SET PARTNER= 'tcp://SQLRepl01:5022'
GO
    1. Run the following commands on the Principal server (i.e. SQLRepl01). The first one tells it that the second server is the Mirror and the 2nd command tells it that the 3rd server is the witness.
ALTER DATABASE ReplicationTest
SET PARTNER='tcp://SQLRepl02:5022'
GO

ALTER DATABASE ReplicationTest
SET WITNESS='tcp://SQLRepl03:5022'
GO

At this point your Mirroring set up is complete. This is what it will look like on the Principal and Mirror:

You can test it by shutting down the primary server and making sure that your Principal automatically fails over. Or you can add data to the principal and take a snapshot of the secondary to ensure the data is getting there (because you can’t access the mirror directly this is the only way to query the data). Note that even though mirroring works with SQL standard, snapshots don’t so if you’re using standard the only way to view the data would be to failover:

CREATE DATABASE Test_snapshot
ON (NAME = 'ReplicationTest', FILENAME = 'C:BackupsTest_Snapshot.SNP')
   AS SNAPSHOT OF ReplicationTest
GO
Select * from Test_snapshot
GO

One of the true values of this is that you can set up 3 databases (or as many as you want) and have each one be Principal on any of the 3 servers and their Mirror and Witness on any of the other 2. Note that if you use the Wizard to create all of this it won’t set up the Endpoint on the Witness instance correctly. Since we created the endpoint with the code ROLE=ALL we can use each of the servers for any of the roles we want.

And that’s it! Let me know if you have any questions.

SQL Mirroring with SQL 2008, part 1

So you’ve made the decision that you want to use SQL Mirroring for your DR solution. You have several options for DR for SQL databases: you can use Windows clustering, log shipping, or mirroring. My preference in any scenario is to just use Windows Clustering, but there are reasons why you could go with another solution. Clustering is nice because it’s outside of SQL and just sits at the OS-level, but by definition you are going to have some hardware sitting there idle even if you go Active/Active (assuming you planned for 1 server going down and allocated your CPU/RAM appropriately. Plus doing it at the OS level doesn’t require you to be an expert or have a fair bit of knowledge of SQL. Using mirroring also allows you to be fully HA as you can have both servers sitting in different sites on different subnets and not have to worry about having your network guys stretch a subnet geographically, which they never want to do.

Note that SQL 2008 (and all versions, for that matter) cannot be used on a Windows 2008 geo-cluster that is on different subnets.

And in case you weren’t aware: HA = Highly Available, DR = Disaster Recovery. Separate, but not necessarily mutually exclusive technologies.

All that being said, let me give you a quick rundown on SQL Mirroring. I can’t say it better than Microsoft, with their gaggle of technical writers, so here’s an article from them on the benefits of Database Mirroring:

http://msdn.microsoft.com/en-us/library/ms189852.aspx

A few items of note that I always like to tell my clients:

  • You need a minimum of 3 instances of SQL for full redundancy on Mirroring (Principal, Mirror, Witness). You can technically do it in 2, but you lose the capability for automatic failover without the witness. Think of it as the 3rd vote if this were a cluster.
  • All SQL servers need to be a member of an AD domain and running under the same service account
  • There is a copy of the database on both the Principal and the Mirror.
  • The Witness allows for automatic failover in the event of a server failure
  • Mirroring is on a per-database level, not per-server level. A SQL server can be a mixture of single databases plus mirror instances.
  • The database can be set in synchronous or asynchronous mode
    • Under asynchronous operation, the transactions commit without waiting for the mirror server to write the log to disk, which maximizes performance
    • Under synchronous operation, a transaction is committed on both partners, but at the cost of increased transaction latency

There are also a few selling points for Mirroring, if you’re having trouble selling having all these servers to your management or to your clients.

  • Since mirroring requires 3 servers be active at all times, you can run databases on all 3 servers, some mirrored and some not. You can also have any combination of Principal/Mirror/Witness running across the 3 servers (i.e. Active/Active/Active)
  • Mirroring can be done across multiple sites and subnets, thus allowing for full HA/DR

In my next article I’ll get into the nitty-gritty of how to actually set this up and show you all the SQL commands for this.

Creating/converting a MNS 2008 Cluster with EMC RecoverPoint (part 2)

In my previous post I covered the considerations you’d want to make when adding a 3rd node to your existing shared quorum cluster at a new site. Now that you’ve made the decision and are using EMC RecoverPoint with Cluster Enabler (RP/CE) to manage the data replication and management of the disks and are converting your cluster to MNS, I’ve written up the steps to actually do this.

The EMC documentation is clear as mud on this. Literally you’ll go to the index where it says “Cluster Enabler install” and it’ll have step 1, then say “go to page 127”. You’ll go there and it’ll have step 2 and then will say “go back to page 76”… On and on. It’s actually so confusing that the consultant we had come from EMC to help answer our questions later called me and asked for my documentation so that he could use it at an installation at another client.

Please note that the below steps worked explicitly in my environment, but may need some changes to conform to specifics in your environment. Where noted there are different steps for 2003 and 2008 clusters. This assume that your SAN group has already replicated all the appropriate LUNs with RecoverPoint and that you’ve base-installed any new nodes.

1) Install Windows Installer 4.5 (if not already installed)

2) Install CE on all host nodes in the cluster (including the 3rd node that you’ve already base installed and have not yet added to the cluster).

  • Copy both the *base.msi and *plugin.msi to the same directory on your target machine (i.e. C:temp)
  • Run *base.msi, accept all the defaults. Reboot
  • Repeat for the existing nodes in the cluster, moving resources around as necesary. Note that at this point you’re only installing the files, you’re not actually enabling the cluster yet.

3) If your SAN group was nice enough to name the Consistency Group (replicated LUNs on the SAN. All the disks in the same Windows Cluster Group must be in the same Consistency Group on the SAN side) the same as your Cluster Group, then you’re fine. Otherwise you need to rename the Windows Cluster Group to match the name of the RP CG. All of the disks in the CG need to match the disks in the Windows Cluster Group. Renaming a Cluster Group doesn’t affect anything.

4) Have your SAN group ensure that your disks are replicating successfully and in sync.

5) Convert your cluster to MNS

  • Windows 2008: Right click on the cluster and go to More Actions —> Configure Cluster Quorum Settings. Check the box for “Node Majority”. Click Finish thru the wizard
  • Windows 2003: Right click on the “Cluster Group”, select New —> Resource and select the name as “MNS Resource”. Change the resource type to “Majority Node Set”. When done, bring the resource online. Right click on the root name of the cluster and select the Quorum tab. Select the “Quorum Resource” drop down box and change it to the “MNS Resource” you created.

6) Delete the old Quorum disk (Q:) from the cluster groups.

7) Assuming you have it, delete any Private networks from the cluster. You can’t use them anymore for cluster communications unless you’re extending 2 different subnets.

8) Have your SAN resource go into RP and enable image access on the 3rd node at the remote site.

9) Right-click the cluster and select Add Node. Add the server name and run through the validation wizard. You now have a 3 node MNS cluster.

10) Have your SAN resource go into RP and disable image access on the 3rd node. They also need to go into the RecoverPoint Management Applications and select the Consistency Group. In the Components pane, select the Policy tab. In the stretch Cluster Support area, check Use RecoverPoint/CE. Ensure that Group is managed by CE, Recoverpoint can only monitor is selected.

  • This step is very important! If you have trouble later it’s likely that your SAN resource did not do something in this step correctly.

11) On each node of the cluster go to All Programs —> EMC —> Cluster Enabler —> RecoverPoint Access Settings

  • Type in the IP of the RPA (you’ll get this from your SAN resource). There should be one on both sides of the WAN. Use your local one on each side.
  • The default userid/password is plugin/plugin. I suggest having the SAN guys change the default and tell you what the new account is.

12) In the same Start Menu group, go to EMC Cluster Enabler Manager

  • Click Configure CE Cluster
  • You should be able to accept the defaults on the rest of the wizard. If you get an error it’s likely because of step 10 or 11.

13) At this point you’re technically done. You’ve got a 3 node MNS cluster with RP/CE. You should be able to fail your cluster groups between the 3 nodes without any issues. If you can’t bring the disks up on any of the other nodes, check step 10. You HAVE to have CE manage the cluster. CE is what’s installed on your cluster nodes and you now have a new resource in the cluster that all your disks are dependent on.

But of course before you can truly fail over to the 3rd node you need to install your application onto the new node. I can’t tell you those steps since I don’t know your app, but it should be the same steps as when you did the 2nd node. Note that SQL installs vary by version on how you do the 3rd node install. Sometimes you have to slipstream Service Packs into your base SQL binaries and then just run setup. Older versions may require you to do a command line install with certain switches. Make sure you read documentation!

Creating/converting a MNS 2008 Cluster with EMC RecoverPoint (part 1)

I was supporting a handful of Windows 2008 (non-R2) 2 node clusters with shared quorum disks. Some had SQL 2008 installed and some were just a vendor application that we supported. For the purposes of this article it doesn’t really matter which so we’ll assume we’re talking about SQL 2008.

So the existing configuration was a 2 node Active/Passive SQL 2008 Cluster on Windows 2008 using shared EMC storage and a quorum (Q:) to hold the vote. They also had a private NIC (hard-wired crossover cable) and a public NIC on the 192.168.100.0/24 subnet. This is a high-availability (HA) environment.

The company purchased a new datacenter and for disaster recovery (DR) purposes wanted to extend the cluster down to the new datacenter. This would allow us to have a cluster with both HA and DR (i.e. able to recovery almost immediately and also to come up in case the datacenter disappeared).

There are several decision points when it comes to how you would extend your cluster to the new site:

1. Will you need to “stretch” your public VLAN down to the new site (i.e. have the same VLAN on both sides of the WAN) or will you be able to put the new cluster node on a new subnet.

  • 2008 supports having cluster nodes on different subnets, 2003 doesn’t. That’s your first answer. The second answer is that some applications (including SQL 2008) do NOT support clusters that are NOT on stretched subnets
  • The next answer is is your network person willing/able to stretch the subnet. Everywhere I’ve worked the first answer from the network team is a resounding NO, but eventually you can wear them down!

2. How will your replicate your date to the new site? Microsoft does not inherently replicate the data for you, the cluster just expects it to be there.

  • There are several solutions for this, but in my case we were a EMC shop so ended up using EMC RecoverPoint, which does block level copies on the SAN over the WAN. Note that whatever you use it has to be something can copy the data either asynchronously or synchronously. It just depends on how quickly you want your cluster up.
  • Also note that your cluster nodes at Site 1 (nodes 1 and 2) can STILL share their storage between them. The cluster nodes at the other sites will have their own copy of the storage (and can even share between multiple nodes there). That’s where your storage software (PowerPath, etc.) comes in handy.

3. How many nodes will you put in your new cluster and what quorum model will you choose?

  • This is a very contentious issue and everyone has their own opinion. As always, it depends.
  • If your data center (DC) model is that 1 DC is primary and the other is only for DR then you want  your primary DC to win the “vote” if the link between the 2 DC’s goes down. You don’t want there to be a voting storm or the 2ndary site to ever think he can win the vote.
  • As far as the vote concerns the primary machine needs to be able to win a majority vote. In a 2 node shared quorum model it takes 2 votes to win, thus each node has a vote and then the quorum has a vote. So whoever owns the quorum disk gets the vote.
  • In a 3 node majority node set (MNS) model, there is no shared quorum anymore so it still takes 2 votes to win. If you have 2 votes at DC1 and 1 vote at DC2, DC2 will never take primary on its own (altho you can certainly force it). If you lose the link your primary site should still be okay, which is what you’d want.
  • So if you went with a 4 node MNS cluster, 2 nodes at each site, you can see that if you lose the link you’d need 3 votes to be majority… and you’d NEVER get it. In that case the cluster resources would all go offline, since no one can get majority
  • If you went with a 5 node MNS, you’d still need 3 votes, but then you have the quandary of where to put the 5th node. You can put it at your primary site and be fine, but then you have to ask what adding the last 2 nodes really buys you (ignore the question of Active/Active clusters)
  • In the best world scenario you conceivably have a THIRD DC and you put the 4th or 5th node (or 12th for that matter) at the 3rd site and it has independent connections to both the other data centers. Then his vote always counts. But you still always have the problem of what happens if any/all of the datacenters become isolated and what you want to have happen when that happens.
  • Your other option, rather than stand up a whole 5th node to cast a vote, is you can use what’s called a file share witness (FSW) on a file server, which is simply a file share that has the ability to cast a vote. Other than that it can be treated the same as any other node.

4. Your next question is how you want Windows to manage who owns the disks in the cluster and who gets to make them active.

  • This is usually dictated by your replication software. You always have the option to do it manually (i.e. bring up the disks manually in a failover scenario). In our case we were using EMC RecoverPoint so used EMC Cluster Enabler to manage the disks from the OS side.

As you can see there are lots of decision points to make when you want DR and how to create/convert clusters when you want to add nodes and have full HA and DR. In my next post I’ll talk specifics on how to convert a 2 node shared quorum cluster to a 3 node MNS cluster with EMC RecoverPoint and ClusterEnabler for management.