Monday, July 2, 2012

How to measure a DBA's workload?


For the past some days, I was thinking of ways to measure the workload of a DBA or team of DBAs in the context of operations. Metrics such as ticket counts, instance or database counts, etc. each have their own imperfections but they all have their advantages, too. Maybe you've got a simple formula that combines multiple metrics?

It all depends on what is the purpose. When planning DBA teams resourcing at one of my previous work place, we were using time because we know how much time we get from the team (minut vacations, overheads, training, internal projects and etc) and how much time we committed to the customers and the forecasts. I didn't see a single or a combination of dimensions that coule reliably give you universal indicator. If you take database count, for example, one customers would take 10 times more efforts per database compare to another but the environment and scope of work are different. It's always comparing apple to oranges.

When I managed a team, I split it into five tracks. Tickets (Issues/Problems), Projects, Learning, Admin and Timeoff. Time and effort expended in hours were the basic metrics. The key is to have a time forecast of loading or utilization based on historical trends of tickets, current projects being worked on, Admin and Timeoff so you can demonstrate the capacity of your team in a systematic way. We could learn a lot from the PM folks. They are always busy and they can prove it!! If you don't have this, you are going to be challenged on what the resources are doing and you will be overloaded. Run your group as a consulting busines and you will find it is the way to go.

I have done a little bit of research on the same before on this subject, please see http://www.khwajaimran.blogspot.com/2012/07/how-many-dbas-do-you-need-to-support.html

Essentially, I found that an average DBA can take care of 40 databases, up to 5 TB. Bigger data is harder to handle, so it isn't universal, but this is a good baseline. I know every shop is different, but I have been to at least a hundred different companies and this seems to hold relatively true, even across industries and vendors. Caveat emptor, there are always exceptions.

Were I measuring a team, I would come up with a baseline somewhat like this, but customized door our environment. Then I would look at whatever factors either increased revenue, decreased cost, or removed risk for the business in a quantitative way. Rank your people based on these, as they are the only things that your executive team cares about. If you can explain your teams actions in these terms you will be much better off.

From my experience as a DBA for over 8 years, I agree with the 40:1 ratio and DB Size but at the same time load on DB is also an major point where managers needs to keep eye's close and estimate head count requirement for respective Technology DBA's. I define DB Load as number of concurrent sessions and transactions. We may also see Data warehouse systems with us, but comparatively less concurrent sessions at the same time DB must be tuned is such a way that daily data loading and other process should complete within the agreed SLA limits so at that time DBA also have load on them. So, when I estimate the DBA requirement, I will mostly take care of 
  1. Technology DBA's 
  2. Number of DB/Instances
  3. Number of Servers 
  4. Each DB Size
  5. Load on Each DB/Instance 
  6. Capabilities of existing staff members 
  7. Expected average completion time for each task for Production DB Support 
  8. Projects In Hand (Other than the regular Production Support) 
  9. Expected Projects
  10. Project time lines 
  11. Staff (Scheduled Leaves & Unscheduled Leaves taken into consideration)
  12. Last but not the least Commercials: This is required for low cost projects and tasks should be done by junior staff and critical deliverable which impact business can be taken care by senior Staff.
Again, this is always tough to document and prove. I think it depends on the priorities from up top. For us, even though DB projects lag or never get done the executives don't really care about that metric. Nor do they care about the number of instances or databases managed per dba. They see the bottom line of everything is fine operationally so why add headcount? What they do value is development for our core product. If something were to slow down development (ie ticket turnaround for new/altered schema) then they would notice and react. Certainly not perfect or all encompassing, but developer tickets is right now the only way I've justified additional headcount.

For me the main measure is % completion within SLA. Of course this assumes that you have agreed SLAs for all your activities. For break/ fix, this is driven by the urgency assigned when the ticket is raised. For scheduled activities (new DB builds etc) this is pre-defined or agreed at the time the ticket is raised. If your team cannot deliver a consistent completion rate within SLA of 97% plus, then you maybe should be looking at the structure of your team.

One thing to also consider, is the type of application(s) running on the databases(s). A database serving up data for a well written application - no matter what size DB or load or anything else - is going to require a lot less time from a DBA compared to a database serving a poorly written application. I've seen many cases where a developers have general database skills and don't always design correctly for the database they're using. Then they release the application and 2 months later frantically phone the DBA team saying the database is slow ... I think we've all been there. The point is, database maintenance efforts have a direct relationship to the quality of the application running with it. Just another thing to consider along with the metrics.

A new question araises here, does this hold true for different types of databases? Teradata? SQL Server? MySQL? Sybase? DB2?

In my previous post, I suggests that SQL Server is somewhat easier to manage (i.e. more instances per DBA) but I think that has a smaller effect than other things. My own experience is that the type of database has less influence on the team's efficiency than other elements. For example, I two teams in comparable technical environments but widely varying efficiencies (60 instances per DBA versus 110). The differentiating factors seem to be non-technical: quality of documentation; maturity of ITIL processes, overhead induced by processes.

Sometimes, I have a different thought again like the number of DBA's depending upon size of the database doesn't help. In the era of big data size doesn't matter, number of tickets, average ticket resolution time, SLA trends etc, DB criticality (Tier 1,2,3) etc are very important when compared to the DB size. I find the size of the database has little to do with the workload of the DBA. We've got a number of smaller databases that require many hours of DBA work simply due to the activity levels of the business. Some of our internal clients are continually making changes or creating short-term applications (some of these are online games or projects related to broadcast programming). These projects usually have very tight deadlines and require considerable attention from our DBAs. 

I don't think you can base a DBAs workload on simply the number of tickets (one ticket might required 10 mins of work while another could be a week). Size of the database isn't necessarily an indicator of workload either. I think the best measurement is going to be the hours required to work on a given project or task. And, even then, the result may be different depending upon the amount of experience a DBA has. 

If you had a ticketing system that is rigorously used both for tracking and time spent and you were able to report on it by database, then that would be a proxy for what you are describing. Excellent.

Once again, there is no unique formula. Two major examples: 

1. If DBA # 1 works 1-hour on a ticket and DBA # 2 works also 1-hour on another ticket: Is the work they do the same? 
Answer: NO. 

2. If DBA # 1 handles daily 20 databases and so does DBA # 2: Are they equally busy? 
Answer: NO.

Measuring only on the basis of ticket is wrong. But if we add the complexity of tickets and client issue which were not resolved by others and one is consuming time of 8 hour to resolve where as deliverable is too much critical at the same time another DBA using 8 hours for small normal daily activity is also having same importance. In short DBA's are valued resource to the company and their respective managers have to decide depending on scenarios and current requirement what is fruitful for him/her.

DBA workload in my opinion should be measured on the complexity type of the tasks and the duration it takes to complete. 

For example, you can have a DBA work all day working on one big project, say 11g R2 Upgrade. 

You can have another DBA do 50 BAU tickets.

DBA workload should be measured on the DBA skill set level & how much Projects/BAU operations he/she can handle. Nowadays, it is not only the technical skills that matter, we have to measure them also on the soft skills they possess. If there is a gap in that area, projects does suffer due to communication issues. 

Long story short, we should categorize DBA workload as Level 1 (Entry Level), Level 2 (Medium), Level 3 (Complex). 

Measuring DBA workload will be kind of like a tuning exercise. You start with a baseline on your team strength and the tasks in hand and based on the upcoming activities, can easily compute whether you can handle the additional workload with the current resource pool (or) if you need to hire additional resources (or) limit projects.

I agree that the DBA workload should be measured as Entry, Medium or Complex. Most ticketing systems do not have such a meter used or if it exists its values are often questionable due to the reason that people who create, assign or approve DBA tickets cannot qualify the complexity of the work. Surprising enough, many generic managers still cannot comprehend the difference between compressing a file with WinZip and compressing a table with Oracle Advanced Compression.

1 comment:

Alex said...

Pretty nice post.
Thanks for sharing it.
Alex.