SharePoint 2010 Optimization: Start with SQL

If you need more than 4Gb of storage for your SharePoint 2010 environment and blob linking is not for you, working with SQL Server is on your plate. 2010 has a lot of databases, each and every one of which has a role and performance profile. Knowing these can help you squeeze out that extra bit out of your environment.

A typical, or maybe not so depending on your individual needs/budget/etc, SharePoint 2010 environment has the potential to look like this. If yours does not, and almost certainly it does not 🙂 , I’m sure you can see similarities.

Viewing this in a compartmentalized fashion skewed towards where you can gain literal and alluded performance improvements you have roughly three major sections.

A: The database
B: The WFEs
C: The UI / General payload

I’m going to focus on the databases, what they are, and what they do. With knowledge comes power as they say.

So starting with the basics…


System
A normal installation of SQL drops a couple of databases on the disk.
Expect to see:

  1. MASTER: Records all system level information for a SQL Server instance. E.g. logins, configuration, and other databases.
  2. MSDB: Records operators and is used by SQL Server Agent to schedule jobs and alerts.
  3. MODEL: Is used as the template for all the databases created in the instance.
  4. TEMPDB: All temporary tables, temporary stored procedures, and anything else that is ‘temporary’ is stored here. Like a cache this is recreated every time the SQL Server instance is started. 

Notes:
  • Of these four, only the last one TEMPDB really should be a candidate for its own spindle. 
  • Size wise, TEMPDB is the only one that you should expect any type of growth from.
  • All four scale vertically, as in you can only allow them to grow in size rather than add in extra databases.
Reporting Services
Reporting Services are often used with SharePoint 2010s Excel, Visio, and PerformacePoint Services but are not required. Access Services does require them, specifically the SQL Server 2008 R2 Reporting Services Add-in. 
Expect to see:
  1. ReportServer: This stores all the metadata for reports such as definitions, history, scheduling, and snapshots. When ReportServer is actually in use the report documents are stored in SharePoint Content databases.
  2. ReportServerTempDB: When reports are running temporary snapshots are found here.

Notes:
  • Both of these databases must coexist on the same database server. 
  • They have a heavy read load.
  • The ReportServerTempDB can grow if there is heavy use of cached snapshots. Just like with the System databases, all four scale vertically, as in you can only allow them to grow in size rather than add in extra databases.
Overview 1
Okay, so now we have a bunch of databases sitting on our server making it look a tad like this:
Now Lets start to add in SharePoint 2010…
Microsoft SharePoint Foundation 2010
Microsoft SharePoint Foundation 2010 drops a number of databases to your disks. (If you install Search Server 2010 Express you will see some of these databases. I have put a * next to the name if it is a factor in play for that kind of scenario.)
Expect to see:

  1. Configuration*: As inferred by its name, this is where information about the environment is stored. Databases, IIS sites, Web Applications, Trusted Solutions, Wb Part Packages, site templates, blocked file types, quotas, they’re all in here. There can only be one configuration database per farm and you can expect it to remain relatively small.
  2. Central Administration Content*: Just like the configuration database, there can only be one, it does really grow much, and as its name infers it stores the Central Administration sites content.
  3. Content*: All site content is stored in these guys. Site pages, documents, lists, web part properties, audit logs, user names and rights, they’re all in here. Office Web Application data is also stored in here.
  4. Usage*: The Usage and Health Data Collection Service Application uses this database. This write heavy database is unique in that it is the only database that can be (and should be) queried directly by external applications. There can only be one per farm. Health monitoring and usage data used for reporting and diagnostics is in here.
  5. Business Data Connectivity*: External Content Types and related objects are in here. It is typically a read heavy yet small sized database. 
  6. Application Registry*: You really only see this if backward compatibility with the BDC API is needed. Typically this can be deleted AFTER upgrading is complete.
  7. Subscription Settings: Features and settings for hosted customers gets put in here. It is not something you normally see as it has to be manually created using Powershell or SQL Server. If it is present you can expect it to remain small and busy only from reads.
Note:

  • Content databases can hold multiple site collections and their content. 
  • They can get BIG. Their size is dependent on size and count of both content and users.
  • 200GB should be your ceiling. Odds are you will see a performance hit as you pass that number. 1TB or more are an option but for Record Centers and the like. I.e. non collaborative usage and data retrieval (read only.) Add more content databases as you reach that 200GB level.
  • Due to its nature it is wise to move the Usage database to a separate spindle.
Overview 2
Okay, so now we have Foundation running with a bunch more databases sitting on our server making it look a tad like this:







Right so now lets move on to the Standard edition.

Microsoft SharePoint Server 2010 Standard
Compared directly to Foundation, Microsoft SharePoint Server 2010 Standard drops a number of extra databases to your disks. (If you install Search Server 2010 Express you will see some of these databases. I have put a * next to the name if it is a factor in play for that kind of scenario.)
Expect to see:
  1. Search Administration*: Used by the Search application the ACL and configuration information is in here. Growth can be vertical or horizontal – if you create new instances of the service application. 
  2. Crawl*: Again used by the Search application, the state of the crawled content/data and the historical crawl records are in here. This database can get a bit big and if you are crawling LARGE amounts of data it is a good idea to not have it on the same server as the next database, Property. It is also a good idea to be using the Enterprise version of SQL so that you can leverage the data compression functionality. This is a read heavy database.
  3. Property*: Again used by the Search application associated data to the crawls is stored in here such as history, crawl queues, and general properties. If you have large volumes of data, move this to its own server and you will see performance improvements with regard to search query result returns. This is a write heavy database.
  4. Web Analytics Reporting*: Used by the Web Analytics application, aggregated report tables, fact data grouped by site – date – asset, diagnostic information can be found in here. Depending on your policies this database can get massive. As it grows scale out by adding more databases. 
  5. Web Analytics Staging*: Temporary data (unaggregated) is in here. 
  6. State*: Used by the State Service application, InfoPath Forms Services, and Visio Services. Temporary state information for Exchange is also in here. More State databases can be added via PowerShell. 
  7. Profile: Used by the User Profile service application, users and their information is found and managed in here. Additional databases can be created if you create additional service instances. Expect lots of reads and medium growth.
  8. Synchronization: Again, used by the User Profile service application. It contains configuration and staging data for use when a directory service such as AD is being synched with. Scaling up or out is the same as with Profile. Its size can be influenced by not just the number of users and groups but their ratios as well. Expect fairly balanced read vs write activity.
  9. Social Tagging: Again, used by the User Profile service application. As users create social tags and notes they are stored, along with their URLs in here. Expect growth to be representative of how much your community embraces the functionality of social tagging. Scaling up or out is the same as with Profile. This is definitely more read than write heavy. 
  10. Managed Metadata Service: Syndicated content types and managed metadata is stored in here. Growth can be vertical or horizontal – if you create new instances of the Managed Metadata service application. 
  11. Secure Store*: Account names, passwords, and their mappings are stored in here. This is where you may have to collaborate with other internal groups as it is suggested that this database be hosted on a seperate database instane with limited access. Growth can be vertical or horizontal – if you create new instances of the service application. 

Note:
  • Watch the Web Analytics Reporting database. It can get very big.
  • Use compression for Crawl if you have the Enterprise version of SQL.
Overview 3
Okay, so now we have SharePoint Server 2010 Standard edition running with a bunch more databases sitting on our server making it look a tad like this:

Right so now lets move on to the Enterprise edition…

Microsoft SharePoint Server 2010 Enterprise
Microsoft SharePoint Server 2010 Enterprise drops extra databases on top of the Standard list to handle Word and PerformancePoint.
Expect to see:
  1. Word Automation Services: Used by the Word Automation service application information about impending and completed document conversions is to be found in here. This database can be expected to remain relatively small.
  2. PerformancePoint: Used by the PerformancePoint service application settings, persisted user comments, and temporary objects are stored in here. It can be expected to be read heavy and can be scaled up or out.
Overview 4
Okay, so now we have SharePoint Server 2010 Enterprise edition running on our server making it look a tad like this:

Where else can we go from here? Well there’s always Project Server , PowerPivot, and FAST… All are complicated products that require SharePoint Server 2010 Enterprise.





Microsoft Project Server 2010





Expect to see:

  1. Draft: Not accessible to end users this is used to store data used by the project queue. There can only be one as far as I know. 
  2. Published: When a project is published this is where it lives. Timesheets, resources, custom fields, and all the other metadata is stored in here. This database can get large.
  3. Archive: Backup data of projects, resources, calendars, etc. is in here. Growth can be limited, if it is not and is used it will get large.
  4. Reporting: This is the repository of the entire project portfolio. When you publish a project plan,  a copy will be put in here. Read heavy and large…





Microsoft FAST Search Server 2010 for SharePoint





Expect to see:

  1. Search Administration: Stores and manages search setting groups, keywords, synonyms, promotion/demotions, best bets, and search schema metadata. It should stay on the small side and remain read heavy. 




Microsoft SQL Server PowerPivot for SharePoint





Expect to see:

  1. PowerPivot Service Application: Cached and loaded PowerPivot files are in here along with usage data and schedules. You can expect this database to remain small.  
Note:
  • PowerPivot files can be rather big and PowerPivot stores data in content databases as well as the central administration database. 
So where does that leave us? If you have everything above installed you should expect to see something like this…
Looking at these databases and what could / should be 

candidates for having their own platters you should start to see something like this: 












Note that the Secure Store should be moved primarily for security purposes. 


It has to be mentioned as well that each and every environment can prove to be unique so you may not see value in moving anything other than your content databases to another platter.

So what does this look like in the wild? Viewing a screenshot of, an admittedly deliberately slightly duplicated environment,  you can see these  databases as they would appear in real life…

Their file system looks like this:

Clearly there is job security for DBAs going forward :). There is also a level of complexity that merits caution and planning when considering SharePoint 2010. The phrase a stitch in time continues to hold value…

Search Server Express 2010: Trial period for this product is about to expire

Recently I did a Search Server 2010 Express deployment. Everything went without a hitch. Preflight, Installation, configuration, and first few crawls all worked the first time. (As they should!)
After 48 hours I cycled back to check the status of the system and found an issue. Specifically in Central Administration > Review Problems and Solutions > All Reports I found under the Configuration Category “Trial period for this product is about to expire.
Diving in for further detail I saw that the actual source of the error was that “The licensing information not available.” After a little bit of further researching I confirmed that to fix it all you have to do is:
  1. Go to “Central Admin” => Monitoring => “Review Rule Definitions”
  2. Click on “Trial period for this product is about to expire” rule under configuration
  3. Edit it
  4. Uncheck the “Enabled” checkbox.
  5. Save
Useful links

Searching an entire database for a Guid or Unique Identifier

Searching an entire database for a Guid or Unique Identifier can be a bit of a tricky proposition. However a little bit of T-SQL can help…

DECLARE @GUIDHunted nvarchar(60)
SET @GUIDHunted = ‘0A24EC0C-65EE-4519-89DF-ABD3DD24F7EF’

SELECT *, ‘UNION ALL SELECT ”’ + s.name + ‘.’ + ao.name + ”’, count(*) FROM ‘
+ s.name +’.[‘ + ao.name + ‘] WHERE ‘ + ac.name + ‘ = ”’ + @GuidHunted + ””
FROM sys.all_columns ac
JOIN sys.all_objects ao
ON ac.[object_id] = ao.[object_id]JOIN sys.schemas s
ON ao.[schema_id] = s.[schema_id]where user_type_id = 36 — UniqueIdentifier
and s.name != ‘sys’

Here is the output result for a fictional database (just copy the results into a new query window, delete the first UNION ALL and execute).

UNION ALL SELECT ‘myschema.Objects’ , count(*) FROM myschema.Objects WHERE ObjectID = ‘0A24EC0C-65EE-4519-89DF-ABD3DD24F7EF’

UNION ALL SELECT ‘myschema.Objects2’ , count(*) FROM myschema.Objects2 WHERE ObjectID = ‘0A24EC0C-65EE-4519-89DF-ABD3DD24F7EF’

This assumes that programmers weren’t storing guids in nvarchar types or that the programmers didn’t create other user types using unique identifiers.

SharePoint Services Timer Service File System Cache Reset / Clear

Reseting the file system cache can resolve many issues in a SharePoint farm. Perform this operation first if you get stuck with any issues related to timer jobs.
File system cache should be cleared on all servers in the server farm on which the Windows SharePoint Services Timer service is running. To do this, follow these steps:
1. Stop the Timer service.
To do this, follow these steps:
a. Click Start, point to Administrative Tools, and then click Services.
b. Right-click Windows SharePoint Services Timer, and then click Stop.
2. Delete or move the contents of the following folder:
a. %ALLUSERSPROFILE% Application DataMicrosoftSharePointConfigGUID
b. Leave the cache.ini alone
c. Delete all other files (all guid.xml) these are all timer job definitions
d. Open cache.ini in notepad and change whatever number you see there to 0
3. Start the Timer service:
To do this, follow these steps:
a. Click Start, point to Administrative Tools, and then click Services.
b. Right-click Windows SharePoint Services Timer, and then click Start.
Note: The file system cache is re-created after you perform this procedure. Make sure that you perform this procedure on all servers in the server farm on which the Timer service is running.
Go back to the %ALLUSERSPROFILE% Application DataMicrosoftSharePointConfigGUID folder and make sure you see a bunch of xml files.
Open the cache.ini and see if the 0 is replaced by a higher value.
You have now synched all your servers with the same timer job definitions from the config db.

A Thought on Monte Carlo Simulation Using Parallel Asynchronous Web Services with .NET and SharePoint

Monte Carlo Simulation is a technique used to estimate the likely range of outcomes outputted by a complex process by simulating the actual process with randomly selected data generating conditions that are true to the process model a large number of times. (In fact, the more you do it the better your data.) The Monte Carlo method is best applied whenever a deterministic solution would either be too computationally intensive or if such a solution does not exist whatsoever.

Monte Carlo Simulation is used in/with

  • Physical sciences
  • Design and visuals
  • Finance and business
  • Telecommunications
  • Games

Monte Carlo Simulation is not a “what if” process. What if’s require single point estimates and use deterministic modeling. Basically you are using best case, worst case, and so on. By using Monte Carlo you consume large random samplings, sourced from probability distribution functions, to produce a large range of outputs which in turn can allow you with greater confidence to produce a narrower range of outputs. In other words you are not using equal weights for each scenario.

Why is this pertinent? Well, stay with me on this one, Markov chain methods are extremely useful for generating sequences of random numbers to accurately reflect rather complicated desired probability distributions, via a process called Markov chain Monte Carlo methods. A tool that is used to generate simulations from a probability distribution…

The Google PageRank of a webpage is defined by a Markov chain.

And the penny drops…

Now, back to the point.

Depending on the degree of accuracy ultimately required, millions or billions of points may need to be tried. Distributing billions of point calculations across multiple servers running Monte Carlo Simulations via web services would parallelize the process and generates results VERY quickly. Good in concept but how to do it?

As defined by the W3C a web service is “a software system designed to support interoperable machine-to-machine interaction over a network.” Running web services on IIS has advantages not limited to:

  • You can grow your “cluster” by just deploying the web service to new nodes.
  • Each web service call with IIS is a thread which should have obvious and positive performance implications.
  • Web services provide a relatively simple and straightforward method of distributing parallel problems across multiple compute platforms.
  • Web services are written like traditional functions, they are easily parallelized without hand-coding a multi-threaded application, custom writing a message passing interface or using other high performance computing management software.

Needless to say, unless your requirements can be served by parallel computations, which would have no dependency on others in the pipe, this is going to become very difficult or rather “challenging” 🙂 very, very quickly.

So how could SharePoint fit in? SharePoint is perfect for acting as a landing point for your data. In and out. Companies benefit by building intelligence into their document libraries and lists with workflows. With workflow, SharePoint can act as a central hub for the data, sending it out to a queue which distributes to nodes on the network. Upon return, the data could be used to populate lists, document libraries, notify people/groups, and more. Search, BDC, Security, and all the other features in SharePoint make this concept a compelling one.

Grep: using it with Windows

Ever need to search files for that ‘piece’ of text? If you come from a UNIX background and are now on Windows GREP is a lacking tool. I recently tested out a few GREP tools for Windows and was pleased with the results. The one I liked the most, of the few I tried, was Windows Grep.

Windows Grep is a tool for searching files for text strings that you specify. Although Windows and many other programs have file searching capabilities built-in, none can match the power and versatility of Windows Grep.”

http://www.wingrep.com/

It has all the usual nice features, regex, soundex, etc.

There’s a great article on using Grep here.


Enabling the PDF iFilter in WSS

To install the Adobe PDF IFilter and to configure the Windows SharePoint Services Search service, follow these steps:

  1. Download and then install the Adobe PDF IFilter from the following Adobe Web site:
    http://www.adobe.com/support/downloads/detail.jsp?ftpID=2611 (http://www.adobe.com/support/downloads/detail.jsp?ftpID=2611)

    Microsoft provides third-party contact information to help you find technical support. This contact information may change without notice. Microsoft does not guarantee the accuracy of this third-party contact information.

  2. Add the following registry entry, and then set the registry entry value to pdf:
    HKEY_LOCAL_MACHINESOFTWAREMicrosoftShared ToolsWeb Server Extensions12.0SearchApplicationsGatherSearchExtensionsExtensionList38

    To do this, follow these steps:

    1. Click Start, click Run, type regedit, and then click OK.
    2. Locate and then click the following registry subkey:
      HKEY_LOCAL_MACHINESOFTWAREMicrosoftShared ToolsWeb Server Extensions12.0SearchApplicationsGUIDGatherSearchExtensionsExtensionList
    3. On the Edit menu, point to New, and then click String Value.
    4. Type 38, and then press ENTER.
    5. Right-click the registry entry that you created, and then click Modify.
    6. In the Value data box, type pdf, and then click OK.
  3. Verify that the following two registry subkeys are present and that they contain the appropriate values.

    Note These registry subkeys and the values that they contain are created when you installed the Adobe PDF IFilter on the server.

    • HKEY_LOCAL_MACHINESOFTWAREMicrosoftShared ToolsWeb Server Extensions12.0SearchSetupContentIndexCommonFiltersExtension.pdf

      This registry subkey must contain the following registry entry:

      • Name: Default
        Type: REG_MULTI_SZ
        Data: {4C904448-74A9-11D0-AF6E-00C04FD8DC02}
    • HKEY_LOCAL_MACHINESOFTWAREMicrosoftShared ToolsWeb Server Extensions12.0SearchSetupFilters.pdf

      This registry subkey must contain the following registry entries:

      • Name: Default
        Type: REG_SZ
        Data: (value not set)
      • Name: Extension
        Type: REG_SZ
        Data: pdf
      • Name: FileTypeBucket
        Type: REG_DWORD
        Data: 0x00000001 (1)
      • Name: MimeTypes
        Type: REG_SZ
        Data: application/pdf
  4. Upload the PDF documents to the Windows SharePoint Services 3.0 Web site.
  5. Stop and then start the Windows SharePoint Services Search service. To do this, follow these steps:
    1. Click Start, click Run, type cmd, and then click OK.
    2. Stop the Windows SharePoint Services Search service. To do this, type net stop spsearch at the command prompt, and then press ENTER.
    3. Start the Windows SharePoint Services Search service. To do this, type net start spsearch at the command prompt, and then press ENTER.
    4. Type exit to exit the command prompt.

Note If the Windows SharePoint Services 3.0 Web site already had PDF documents before you installed the Adobe PDF IFilter on the server, you may have to perform an additional step. After you install and configure the Adobe PDF IFilter on the server, modify a property of the existing PDF documents on the server. For example, modify a metadata property such as the title, or upload the PDF documents to the Windows SharePoint Services Web site again. When you do this, the Windows SharePoint Services Search service crawls the existing PDF documents again.

Nifty: Microsoft Business Data Catalog Definition Editor for Microsoft Office SharePoint Server 2007

Ah Microsoft, you giveth and you taketh.


The Microsoft Business Data Catalog Definition Editor for Microsoft Office SharePoint Server 2007 aka MSBDCDEFMOSS2007 (all me…) is a nifty yet hobbled tool fresh(ish) from Microsoft. In brief its features include:

  • Underlying XML is abstracted by the design surface and properties window
  • Drag and drop web methods, tables, or views to create line of business (LOB) connections.
  • Entities and methods are created automatically from database metadata and WSDLs.
  • Additional method instances can be added to further enhance the database or web service connection.
  • Method instances can be tested from within the tool, enabling incremental development of LOB connections

That said… Here’s the catch. (For now I hope…)
The tool is unusable with SQL Server 2005/2008 schemas… So, you cannot use it against AdventureWorks, BUT you can use it against AdventureWorksDW. If you dare to try, you will be rewarded with a “Could not process Table ‘….’. Also, make sure you have SELECT Rights on the Table/VIEW”.

Helpful links

Nifty: SharePoint Search Service Tool

I discovered a very handy project on Codeplex recently, The SharePoint Search Service Tool. To quote their own site:

“The SharePoint Search Service Tool is a rich web service client that allows a developer to explore the scopes and managed properties of a given SharePoint Search SSP, build queries in either Keyword or SQL Syntax, submit those queries and examine the raw web service results. This tool can be useful in troubleshooting and verifying the behavior and configuration of a SharePoint environment.

The SharePoint Search Service Tool is an evolution of a tool that used to be available back on GotDotNet moons ago that was called the MOSSQueryTool. Because the tool has been enhanced and should work with all SharePoint Search products, it has been renamed for its initial public release.”

Right now it is at V1.1 and is well worth a look see: http://www.codeplex.com/SharePointSearchServ