Microsoft has a lot of content published about database sizing and performance for SharePoint 2010. But it spans a number of documents and some questions are either not fully answered or have conflicting options. This post aims to provide a single summary, albeit quite a long one. All sizing recommendations are taken from Microsoft guidelines on TechNet. References at the end of the post. This the second in a series of posts about SharePoint sizing. Part 1 covered the hardware and software requirements.
When deploying SharePoint, you need to be aware of the amount of data storage you will require in total. For example, if you were planning to store up to 100Gb worth of content in SharePoint (documents and list items), the total storage you would require to use the standard ‘out of the box features’ could be 250Gb. That’s before including any form of RAID protection…
Here’s how that number breaks down:
Sample scenario: 1,000 users each with an average of 200 documents and 600 list items (calendar, links, tasks etc.)
  • Content DB = 100Gb
  • Configuration DBs = 5Gb
  • Search DBs = 20Gb
  • User Profile DBs = 2Gb
  • Office Web Apps Cache = 50Gb
  • Transaction logs = 40Gb estimate
  • TempDB = 20Gb estimate
  • Spare capacity = 13Gb
Now before you head to the hills thinking you need to multiple your content DB by 2.5 or more to calculate your total storage requirements to run SharePoint, some of those databases are single instance and don’t have a linear relationship to your content size or user numbers. But the example does highlight the need to calculate your database sizes. If you are told you have 500Gb to work with, you do not have 500Gb to store documents…

Content DBs

There are two ways to  size your content DBs:
  1. You have a fixed amount of storage to begin with, in which case you need to calculate how much of it is going to be available for content
  2. Estimate and purchase enough storage to meet users’ needs
You could look at how much space your file shares currently occupy. But it is rarely an accurate measure as aged file shares will contain a lot of duplication and redundant (expired) data that would be best moved to File 13 (located by use of the Delete key). I’ve seen estimates that 60% of file storage is wasted…
For the second option, Microsoft provides the following formula to calculate the approximate size of your content:
Content size (in KB) = ((D x V) x S) + 10 x (L + (V x D))
Where:
  • D is the number of documents (simple estimate – multiple number of users by average number of docs per user)
  • V is the average number of versions (must be at least 1, more if you plan to keep version history)
  • S is the average size of documents in KB (normal value is in the range 200 – 400KB)
  • L is the number of list items (tasks, calendars, links etc. simple estimate – 3 x number of documents)
  • 10KB is a constant to estimate the amount of storage required for metadata and also the average list item size. Increase it if you have hired an information architect :-)
p.s. Divide by multiples of 1000 or 1024 depending on your preference for bytes or Bytes when converting results from Kilo to Mega to Giga to Tera.
One extra item to note: MySites. In theory, they should be included in the above estimates but from my experience they are not. Determine the quota you will apply to MySites (e.g. 100MB per MySite), multiple it by the number of users and add to the formula above (don’t forget to convert it to GB).
Once you’ve got your content size, you need to consider how many content databases you will need. The content databases will host the site collections created in your SharePoint deployment. Site collections cannot span content databases.
Microsoft recommends the following, for performance and efficient recovery in the event of a failure:
  • Content databases should be no larger than 200GB
  • Site collections should be no larger than 100GB – this enables use of SharePoint 2010 backup tools to move a site collection to another database if needed
  • If a site collection is expected to grow larger than 100GB it should be given its own dedicated content database
  • Content databases of up to 1TB are supported but only for large single-site repositories in which data remains mostly static (i.e. archives and records management)
In SharePoint 2010, there is also a new content database requirement – the Office Web Apps cache.If you install Office Web Apps as part of your SharePoint implementation, the default cache available is 100GB. The amount can be modified using Powershell. It’s also a good idea to provision a dedicated content DB, or the cache will plonk itself in the first content DB it finds (and pity the site collection(s) that have to share DB resources with it).
Also in SharePoint 2010, Remote BLOB Storage (RBS – a feature within SQL Server 2008 Enterprise Edition) is now supported and should be considered for very large content storage requirements – where the total volume is greater than 5TB or when storing very large individual files. RBS means the content DB itself will be small because the actual files are stored direct on the file system. But it does require a more complex back-up process – to cover both the database and the file store.
To summarise the content DB calculations to consider, for quick scanning:
Database Sizing Notes
Content DBs for intranets, extranets and web sites up to 200GB per DB Depending on your definition of an intranet, the size may be significantly smaller than 200GB, i.e. if content is mostly web pages. If streaming multimedia, plan storage of multimedia files separately (inside or out of SharePoint)
Content DBs for collaborative sites up to 200GB per DB For collaborative activities and most document management requirements. Consider a 1:1 mapping for site collections anticipated to grow larger than 50Gb (i.e.most content DBs will be up to 100GB to match site collection storage limit)
Archives
Content DBs
up to 1TB per DB For large single-site repositories storing inactive content such as records, contracts and reference materials. But also consider using Remote Blob Storage (requires SQL Server 2008 Enterprise Edition).
MySites
Content DB
up to 200GB per DB Larger deployments with generous quotas for MySites may need to plan for requiring more than 1 content DB
Office Web Apps cache 100GB Default setting when OWA installed. Can be adjusted.

Configuration DBs

I’ve lumped together a range of databases that fall under configuration of some sort. You may not require all of them. Here’s Microsoft’s guidelines:
Database Sizing Notes
Config 2GB Actual size will likely be under 1Gb for small to medium deployments. But transaction logs can be large and advised to recovery model from full (default) to simple to restrict growth of the log file or back up the log file regularly to force truncation (not recommended to attempt manual truncation)
Central Admin Content 1GB Only one databaser per farm. Size will grow faster if using PowerPivot as it also stores the spreadsheets and tables used for the PowerPivot management dashboard
Usage and Health data collection 2GB 2GB would allow storage for up to 1 million HTTP requests. Can configure to only retain data for a certain period or log only page requests versus everything or don’t use at all… IOPS should be considered (is write-heavy)- 115 x page hits/sec or 5 x HTTP requests
Subscriptions
Settings
1GB Only needed if enabled through Powershell (not enabled by default) – is used for hosted deployments of Sharepoint
State 1GB+ Only one database per farm. Stores temporary state information for InforPath forms service, the chart web part and Visio services. Actual size will depend on usage of those features. One document recommends 1Gb, another suggests it may be 100GB+…
Secure Store 5Mb per 1,000
credentials
Used for single sign-on – stores and maps credentials
Web Analytics – Staging up to 100GB Size affected by number of reports being generated, retention period, daily volume of data being tracked and number of site collections, sites, and sub-sites in the web applications being analysed.
Web Analytics – Reporting depends… Stores aggregated standard report tables, fact data aggregated by groups of sites, date and asset metadata and diagnostics info for the web analytics service. Microsoft’s database types and descriptions document states this can grow very large (i.e. up to 1TB and beyond…)

Services DBs

The various services within SharePoint require databases. Which ones you need to include in your sizing calculation depends on which services you plan to use…

Search Service

Database Sizing Notes
Admin 1GB+ Only one per service instance. Hosts service configuration information, ACLs and and best bets for the crawl component. Accessed for every user and administrative action.  Actual size will depend on number of content sources, crawl rules and best bets plus traffic. Should fit into available RAM on server for optimal performance.
Crawl (one or more) 4.6% of content DBs but… Stores the state of the crawled data and crawl history – up to 25 million items per Crawl DB. IOPS requirements are significant – 3,500 to 7,000. Also allow 0.011 x content DBs for transaction log and 0.00011 x content DBs for tempDB files.
Properties (one or more) 1.5% x content DBs but… Stores information associated with crawled data including properties, history and crawl queues – up to 50 million items per property DB. You should be able to fit at least one third into RAM on the server (for fast queries) IOPS requirements are significant – 2,000. Also allow 0.0031 x content DBs for log file and 0.00034 x content DBs for tempdb files
FAST Search Admin 1GB+ Only if deploying FAST Search add-on to SharePoint. Stores FAST specific config data search as synonyms, visual best bets and relevance rank adjustments (property and site promotion/demotion. As well as additional features, FAST scales indexing beyond 100 million items
Index (divided into partitions, mapped to Query Components) Index size: 3.5% x content DBs

Disk space required: 3 or 4 x index size
The actual index. SharePoint can index up to 100 million items and it is recommended that you partition the index so that there are no more than 10 million items per index partition. The calculation is 3.5% x total content being index to calculate the total index size and multiply by 4 if planning to repartition, otherwise multiply by 3 to support master merges (if only having one partition).
Note: content DBs used in calculations here should include any other stores you plan to index (e.g. file shares, Exchange public folders etc.
There are conflicting recommendations regarding the search databases and index size. The technical case studies do not match the formulae provided in the capacity planning document and the database types and descriptions document appears to have based its assumptions on indexing upwards of 50TB of content. Most deployments will be a lot smaller than that…
As a very rough guide allow 20% the total size of content DBs for the search DBs and index, not including copies of the index partition. Suggested by Microsoft based on an internal deployment indexing 13.3TB of SharePoint content. This is a big improvement over SharePoint 2007 which could need upwards of 40%.
Here’s how the sample scenario works out, using the above calculations:
Sum of content DBs being indexed (600,000 items) 100GB
Total index size (will be a single partition)
.
4GB
Disk space required for index (including master merge – 3 x index size) 11GB
Search Admin DB 1GB
Crawl DB 4.5GB
Property DB
.
1.5GB
Total disk space required (not including logs or tempdb files)
- Database server (search admin, crawl DB and property DB):
- Each server hosting the query component and index partition:
.
7GB
11GB

User Profiles Service

Database Sizing Notes
Profiles 1MB per user profile but… Only one per service instance. Stores user properties, newsfeeds and memberships. Capacity Planning recommends 1MB per user profile. Database Types and Descriptions advises will be medium to large (up to 1TB)… actual size will depend on number of properties and size of newsfeeds but inclined to trust Capacity Planning (supported by test results).
Profile Sync approx 630KB per user profile but… IOPS requirements are significant – 3,500 to 7,000. Database Types and Descriptions claims are similar to for Profiles. Actual size will depend on actual user profile size.
Social Tagging 0.009MB per tag, comment or rating IOPS requirements are significant – 2,000. Difficult to estimate as size will depend on user culture (Database Types and Descriptions suggests the DB will be small to very large – 1GB to 1TB+…).
My formula to estimate social tagging: 10% of users will be most active, guestimate number of items they are likely to create on average per month, multiple by 36 to allow for 3 years growth. Then use a smaller average for the remaining 90% of users; else go for an average number of ratings per user over 3 years x number of users x 0.009MB.

Data Services

Database Sizing Notes
Business Data Connectivity Service 0.5MB per external content type Recommended 50MB as default
Application Registry Service 1GB Only required if upgrading from SharePoint 2007 Business Data Catalog (BDC)
Managed Metadata up to 100GB One database per service instance. Stores managed metadata and syndicated content types. Affected by number of content types and keywords used. Capacity Planning doc does not provide guidelines. Database Types and Design doc suggests medium size (up to 100GB) depends on amount of managed metadata.Suspect the use of managed metadata is stored here and not just the definitions – would explain why it’s a special column type within sites that is read-only in datasheet view and doesn’t work in Office 2007 (and why search can use it to use keywords as search refining properties)…
Word Automation 1GB One database per service instance. Stores info about pending and completed document conversions.
PerformancePoint 1GB One database per service instance (Enterprise Edition only). Stores temporary objects, persisted filter values and user comments
PowerPivot 1GB Part of SQL Server Enterprise Edition Analysis Services (add-on to Excel for analysing very large data sets). Stores location of cached or loaded data files, data refresh schedules and usage data copied from central usage data collection DB
Project Server DBs up to 100GB per DB Project Server 2010 requires SharePoint 2010 Enterprise Edition. Separate DBs for drafts, published, archives and reporting DBs.
Reporting Services TBC Advanced reporting (SQL Server) plus used for Access services
There are other services DBs but the remainder should only have very small storage requirements. Group them together in an ‘other DBs/space capacity’ category

SQL Server System Databases

SharePoint also requires the standard SQL Server system databases for performance, availability and recovery.
Database Sizing Notes
master < 1GB Records all system-level information for SQL Server. Will be small and significant growth is unlikely.
model < 1GB Used as a template for all databases created on an instance of SQL Servr. Will be small and significant growth is unlikely
msdb < 1GB Used by SQL Server Agent for scheduling jobs and alerts. Will be small and significant growth is unlikely
tempdb depends… Holds temporary objects (working set), recreated each time SQL Server is started.  Will grow and shrink as required and size can range from small to very large depending on activities. Will depend on number of users and data intensive processes such as indexing (see search table above for search usage of tempDB). Use fastest disks possible and monitor free space. Don’t pre-grow, allow tempDB to use up to all of remaining disk space. Insufficient disk space for tempdb will impedeperformanceNote: if using Reporting Services (see under Data Services), there will be a second tempdb for reporting services with same riteria.

Transaction Logs

SQL Server transaction logs record all changes that were made to a database since the last checkpoint or full backup. They are essential if you need to restore a SharePoint server from back-up to its current state prior to a failure.
But how much space to allow for them is a question unanswered. I’ve seen guestimates range from 20 to 50% the size of your databases being logged, but no definitive formula from Microsoft. Plus most advice specifically refers to SQL Server 200 or 2005, even when listed under SQL Server 2008. And there are plenty of horror stories about  log files growing too big. What is most important is to ensure you have a process in place to manage your transaction logs as part of your overall backup strategy. The size of the logs will depend on the level of activity that occurs between backups. I’d be interested to hear opinions from SQL Server experts on this topic.

Disk Volumes

Finally, having come up with a set of databases that you will need for your SharePoint implementation, the last question is how to organise them on your hardware. If deploying on a SAN, Microsoft recommends separate LUNS with each LUN consisting of separate physical disk spindles as follows:
  • Content DBs
  • Search DBs
  • Transaction Logs
  • TempDB files
  • Config and Services DBs
For large farm deployments scaling to include enterprise search capabilitise, consider separate physical hardware for storing the search databases (with own tempdb, transaction logs etc.)

References

Finally…

If you found this post useful, similar content and references can be found in the SharePoint 2010 Handbook on this web site.