SharePoint 2010 Optimization: Start with SQL

If you need more than 4Gb of storage for your SharePoint 2010 environment and blob linking is not for you, working with SQL Server is on your plate. 2010 has a lot of databases, each and every one of which has a role and performance profile. Knowing these can help you squeeze out that extra bit out of your environment.

A typical, or maybe not so depending on your individual needs/budget/etc, SharePoint 2010 environment has the potential to look like this. If yours does not, and almost certainly it does not πŸ™‚ , I’m sure you can see similarities.

Viewing this in a compartmentalized fashion skewed towards where you can gain literal and alluded performance improvements you have roughly three major sections.

A: The database
B: The WFEs
C: The UI / General payload

I’m going to focus on the databases, what they are, and what they do. With knowledge comes power as they say.

So starting with the basics…


System
A normal installation of SQL drops a couple of databases on the disk.
Expect to see:

  1. MASTER: Records all system level information for a SQL Server instance. E.g. logins, configuration, and other databases.
  2. MSDB: Records operators and is used by SQL Server Agent to schedule jobs and alerts.
  3. MODEL: Is used as the template for all the databases created in the instance.
  4. TEMPDB: All temporary tables, temporary stored procedures, and anything else that is ‘temporary’ is stored here. Like a cache this is recreated every time the SQL Server instance is started. 

Notes:
  • Of these four, only the last one TEMPDB really should be a candidate for its own spindle. 
  • Size wise, TEMPDB is the only one that you should expect any type of growth from.
  • All four scale vertically, as in you can only allow them to grow in size rather than add in extra databases.
Reporting Services
Reporting Services are often used with SharePoint 2010s Excel, Visio, and PerformacePoint Services but are not required. Access Services does require them, specifically the SQL Server 2008 R2 Reporting Services Add-in. 
Expect to see:
  1. ReportServer: This stores all the metadata for reports such as definitions, history, scheduling, and snapshots. When ReportServer is actually in use the report documents are stored in SharePoint Content databases.
  2. ReportServerTempDB: When reports are running temporary snapshots are found here.

Notes:
  • Both of these databases must coexist on the same database server. 
  • They have a heavy read load.
  • The ReportServerTempDB can grow if there is heavy use of cached snapshots. Just like with the System databases, all four scale vertically, as in you can only allow them to grow in size rather than add in extra databases.
Overview 1
Okay, so now we have a bunch of databases sitting on our server making it look a tad like this:
Now Lets start to add in SharePoint 2010…
Microsoft SharePoint Foundation 2010
Microsoft SharePoint Foundation 2010 drops a number of databases to your disks. (If you install Search Server 2010 Express you will see some of these databases. I have put a * next to the name if it is a factor in play for that kind of scenario.)
Expect to see:

  1. Configuration*: As inferred by its name, this is where information about the environment is stored. Databases, IIS sites, Web Applications, Trusted Solutions, Wb Part Packages, site templates, blocked file types, quotas, they’re all in here. There can only be one configuration database per farm and you can expect it to remain relatively small.
  2. Central Administration Content*: Just like the configuration database, there can only be one, it does really grow much, and as its name infers it stores the Central Administration sites content.
  3. Content*: All site content is stored in these guys. Site pages, documents, lists, web part properties, audit logs, user names and rights, they’re all in here. Office Web Application data is also stored in here.
  4. Usage*: The Usage and Health Data Collection Service Application uses this database. This write heavy database is unique in that it is the only database that can be (and should be) queried directly by external applications. There can only be one per farm. Health monitoring and usage data used for reporting and diagnostics is in here.
  5. Business Data Connectivity*: External Content Types and related objects are in here. It is typically a read heavy yet small sized database. 
  6. Application Registry*: You really only see this if backward compatibility with the BDC API is needed. Typically this can be deleted AFTER upgrading is complete.
  7. Subscription Settings: Features and settings for hosted customers gets put in here. It is not something you normally see as it has to be manually created using Powershell or SQL Server. If it is present you can expect it to remain small and busy only from reads.
Note:

  • Content databases can hold multiple site collections and their content. 
  • They can get BIG. Their size is dependent on size and count of both content and users.
  • 200GB should be your ceiling. Odds are you will see a performance hit as you pass that number. 1TB or more are an option but for Record Centers and the like. I.e. non collaborative usage and data retrieval (read only.) Add more content databases as you reach that 200GB level.
  • Due to its nature it is wise to move the Usage database to a separate spindle.
Overview 2
Okay, so now we have Foundation running with a bunch more databases sitting on our server making it look a tad like this:







Right so now lets move on to the Standard edition.

Microsoft SharePoint Server 2010 Standard
Compared directly to Foundation, Microsoft SharePoint Server 2010 Standard drops a number of extra databases to your disks. (If you install Search Server 2010 Express you will see some of these databases. I have put a * next to the name if it is a factor in play for that kind of scenario.)
Expect to see:
  1. Search Administration*: Used by the Search application the ACL and configuration information is in here. Growth can be vertical or horizontal – if you create new instances of the service application. 
  2. Crawl*: Again used by the Search application, the state of the crawled content/data and the historical crawl records are in here. This database can get a bit big and if you are crawling LARGE amounts of data it is a good idea to not have it on the same server as the next database, Property. It is also a good idea to be using the Enterprise version of SQL so that you can leverage the data compression functionality. This is a read heavy database.
  3. Property*: Again used by the Search application associated data to the crawls is stored in here such as history, crawl queues, and general properties. If you have large volumes of data, move this to its own server and you will see performance improvements with regard to search query result returns. This is a write heavy database.
  4. Web Analytics Reporting*: Used by the Web Analytics application, aggregated report tables, fact data grouped by site – date – asset, diagnostic information can be found in here. Depending on your policies this database can get massive. As it grows scale out by adding more databases. 
  5. Web Analytics Staging*: Temporary data (unaggregated) is in here. 
  6. State*: Used by the State Service application, InfoPath Forms Services, and Visio Services. Temporary state information for Exchange is also in here. More State databases can be added via PowerShell. 
  7. Profile: Used by the User Profile service application, users and their information is found and managed in here. Additional databases can be created if you create additional service instances. Expect lots of reads and medium growth.
  8. Synchronization: Again, used by the User Profile service application. It contains configuration and staging data for use when a directory service such as AD is being synched with. Scaling up or out is the same as with Profile. Its size can be influenced by not just the number of users and groups but their ratios as well. Expect fairly balanced read vs write activity.
  9. Social Tagging: Again, used by the User Profile service application. As users create social tags and notes they are stored, along with their URLs in here. Expect growth to be representative of how much your community embraces the functionality of social tagging. Scaling up or out is the same as with Profile. This is definitely more read than write heavy. 
  10. Managed Metadata Service: Syndicated content types and managed metadata is stored in here. Growth can be vertical or horizontal – if you create new instances of the Managed Metadata service application. 
  11. Secure Store*: Account names, passwords, and their mappings are stored in here. This is where you may have to collaborate with other internal groups as it is suggested that this database be hosted on a seperate database instane with limited access. Growth can be vertical or horizontal – if you create new instances of the service application. 

Note:
  • Watch the Web Analytics Reporting database. It can get very big.
  • Use compression for Crawl if you have the Enterprise version of SQL.
Overview 3
Okay, so now we have SharePoint Server 2010 Standard edition running with a bunch more databases sitting on our server making it look a tad like this:

Right so now lets move on to the Enterprise edition…

Microsoft SharePoint Server 2010 Enterprise
Microsoft SharePoint Server 2010 Enterprise drops extra databases on top of the Standard list to handle Word and PerformancePoint.
Expect to see:
  1. Word Automation Services: Used by the Word Automation service application information about impending and completed document conversions is to be found in here. This database can be expected to remain relatively small.
  2. PerformancePoint: Used by the PerformancePoint service application settings, persisted user comments, and temporary objects are stored in here. It can be expected to be read heavy and can be scaled up or out.
Overview 4
Okay, so now we have SharePoint Server 2010 Enterprise edition running on our server making it look a tad like this:

Where else can we go from here? Well there’s always Project Server , PowerPivot, and FAST… All are complicated products that require SharePoint Server 2010 Enterprise.





Microsoft Project Server 2010





Expect to see:

  1. Draft: Not accessible to end users this is used to store data used by the project queue. There can only be one as far as I know. 
  2. Published: When a project is published this is where it lives. Timesheets, resources, custom fields, and all the other metadata is stored in here. This database can get large.
  3. Archive: Backup data of projects, resources, calendars, etc. is in here. Growth can be limited, if it is not and is used it will get large.
  4. Reporting: This is the repository of the entire project portfolio. When you publish a project plan,  a copy will be put in here. Read heavy and large…





Microsoft FAST Search Server 2010 for SharePoint





Expect to see:

  1. Search Administration: Stores and manages search setting groups, keywords, synonyms, promotion/demotions, best bets, and search schema metadata. It should stay on the small side and remain read heavy. 




Microsoft SQL Server PowerPivot for SharePoint





Expect to see:

  1. PowerPivot Service Application: Cached and loaded PowerPivot files are in here along with usage data and schedules. You can expect this database to remain small.  
Note:
  • PowerPivot files can be rather big and PowerPivot stores data in content databases as well as the central administration database. 
So where does that leave us? If you have everything above installed you should expect to see something like this…
Looking at these databases and what could / should be 

candidates for having their own platters you should start to see something like this: 












Note that the Secure Store should be moved primarily for security purposes. 


It has to be mentioned as well that each and every environment can prove to be unique so you may not see value in moving anything other than your content databases to another platter.

So what does this look like in the wild? Viewing a screenshot of, an admittedly deliberately slightly duplicated environment,  you can see these  databases as they would appear in real life…

Their file system looks like this:

Clearly there is job security for DBAs going forward :). There is also a level of complexity that merits caution and planning when considering SharePoint 2010. The phrase a stitch in time continues to hold value…

Improving the user experience with password entry

Sometimes deploying a forms based registration / login solution for SharePoint, or anything else, makes me somewhat weary. Specifically, creating the system that enforces, for sometimes tens or hundreds of thousands of people, the mundane action of having to add a password no less than twice after panning ones brain for a username when registering for an account just seems so old schoolish and tedious…

Adding a bit of colour, literally, here is a good thing. Not to much, not too little and you can add the equivalent of a breath of fresh air. One great tool for this job is Chroma-Hash.

Chroma-Hash is a jQuery plugin that aims to offer a better password input experience with a standard password field replaces the entered characters with “●” characters. The problem is “we never know if we made a typo or not until the form is submitted & a reply is received from the server”. Chroma-Hash, “converts the values entered” into a colored visualization.

A user filling the password field can easily recognize that if the colors are the usual equivalents of the password or not. And, anyone who sees the colors won’t have any idea what the actual password is.

The script can also be used to visualize password confirmations (password & confirm password fields).

A detailed info on the method & security concerns can be found at thedeveloper’s related post.

http://mattt.github.com/Chroma-Hash/

Quantum Cryptography Now Fast Enough For Video…

Researchers at the Cambridge Lab of Toshiba Research Europe have solved the problem of transferring highly sensitive data at high speed across a long distance network. The team were able to demonstrate the continuous operation of quantum key distribution (QKD) β€” a system that allows the communicating users to detect if a third party is trying to eavesdrop on the data communication β€” at a speed greater than one megabit/sec over a 50 km fibre optic network, thanks to the use of a light detector for high bit rates and a feedback system which maintains the high bit rates during data transfer. … The faster one megabit/sec data handling will allow the one-time pad to be used for the encryption of video β€” a vast step forward over the current ability to only encrypt voice data.

Saving a packet trace in Mac OS X

  1. Log in with an administrator account.
  2. Open Terminal (/Applications/Utilities).
  3. To start the trace, you will type a command, followed by the Return key. The command you choose needs to match the way your computer connects to the Internet.

    For built-in Ethernet, type:

    sudo tcpdump -i en0 -vvv -n -s 0 -w ~/Desktop/DumpFile.dmp

    Note: Both “en0” and “-s 0” include a zero, not the letter O.

    For AirPort, type:

    sudo tcpdump -i en1 -vvv -n -s 0 -w ~/Desktop/DumpFile.dmp

    Note:-s 0” includes a zero (0), not the letter O.

    For a VPN connection or a dial-up modem (PPP), type:

    sudo tcpdump -i ppp0 -vvv -n -s 0 -w ~/Desktop/DumpFile.dmp

    Note: Both “ppp0” and “-s 0” include a zero, not the letter O.

  4. When prompted for a password, enter the one for your administrator account. You’ll see a message in Terminal such as “tcpdump: listening on en0…” which lets you know the computer is actively capturing network traffic.
  5. Now, perform the network activities that involve the issue you’re trying to capture packets for.
  6. When you’re ready to stop capturing packets, click the Terminal window to bring it to the foreground.
  7. Press Control-C.

How to: Recover a raw disk image of your OS X Filevault. and then read it’s contents…

File Vault is the encryption mechanism used to protect user accounts on an OS X file system. Disabled by default it has become more popular since originally released with 10.3 “Panther.”

When enabled, FileVault mounts and unmounts encrypted file systems duringthe user logging in and out process. In 10.5 “Leopard” this image is a Sparse Bundle, recognizable as a collection of 8Mb files. A change from before where a single and large file was used. One primary reason for the breakup to 8Mb is to allow Time Macine backups. A note to upgraders though, be sure to disable and then re enable Filevault otherwise you’ll end up with that sparse image format.

The user’s home directory is encrypted using AES with a key derived from the user’s login password. Content is automatically encrypted and decrypted on the fly. However, this data is not bound as tightly as some assume. Nor is the housekeeping as good as others would assume. Succinctly put: “Encrypted volumes preserve much of the data that is deleted from the volume. Because a file vault is treated as a separate file system, a free space wipe does virtually nothing to destroy deleted data stored inside a vault. Due to Apple’s policy of “security by obscurity”, many false assumptions have been made about the File Vault encryption mechanism, potentially exposing sensitive data to someone with the right tools. Users should be aware of File Vault’s caveats and limitations before relying on it as a means of securing data.

Here are the surprisingly easy steps to pull together a raw disk image from a users FileVault. This particular users name being “beer”. There is an assumption that you already have the key… If you’re looking to crack the encryption look elsewhere for vfcrack, cold boot attacks, and others. This image is suited for FTK or EnCase processing or carving up with Scalpel or Foremost…

Step 1
$ sudo -s
# ls -l /Volumes/Suspect_Disk/Users
total 0
-rw-r–r– 1 root wheel 0 Sep 23 2007 .localized
drwxrwxrwt 4 root wheel 136 Oct 5 18:34 Shared
dr-x—— 3 beer staff 102 Oct 10 13:11 beer

Step 2
# ls -l beer
total 0
drwx——@ 6 beer staff 204 Oct 10 12:58 beer.sparsebundle

Step 3
# hdid -readonly beer.sparsebundle

After being prompted for the File Vault password, the hdid tool will give you a mapping to a series of raw devices for the image:

/dev/disk2 Apple_partition_scheme
/dev/disk2s1 Apple_partition_map
/dev/disk2s2 Apple_HFS

Step 4
The third device down, disk2s2, is the one you’re interested in; it contains the HFS file system containing the home directory. You can copy off the enire file vault’s contents with a simple disk copy:

# dd if=/dev/rdisk2s2 of=/Volumes/Digital_Vault/rdisk2s2 bs=16384

Here’s the annoying part: because the vault is mapped onto the file system, you’re actually going to get a very large file containing a lot of padding; this could be several hundred gig in size depending on the virtual size of the file vault. You’ll need to make sure you have adequate disk space to contain this image.

Once you’ve got the image, it can be accessed much like any other HFS image. You can load it into FTK, or use a data carving tool like Scalpel to get data off of it. In addition to this, you can rename the image to have a .dmg extension and mount the decrypted file system right on your mac:

# mv /Volumes/Digital_Vault/rdisk2s2 /Volumes/Digital_Vault/rdisk2s2.dmg
# hdid -readonly /Volumes/Digital_Vault/rdisk2s2.dmg /dev/disk3/Volumes/beer

And that’s pretty much it…

A random tidbit on non random data

I recently was talking with somebody who felt that TrueCrypt hidden volumes were the bee knees. The scenario they used, and which I myself have read ‘musings’ about, involved a laptop carrying sensitive corporate data being seized by customs. Laptop drive gets “reviewed”, secret container is not seen, and laptop passes as normal and uninteresting. Big deal. Bigger deal is if you have 007 style data and that guy in the uniform is pretty certain you have it as well. My colleagues version of the story ends with an almost hollywood style style exhalation of breath and cinematic zoom out to the hero walking out the door. That’s not how it would probably pan out…

Truecrypt volumes, which are essentially files, have certain characteristics that allow programs such as TCHunt to detect them with a high *probability*. The most significant, in mathematical terms, is that their modulo division by 512 is 0. Now it is certainly true that TrueCrypt volumes do not contain known file headers and that their content is indistinguishable from random, so it is difficult to definitively prove that certain files are TrueCrypt volumes. However their very presence can demonstrate and provide reasonable suspicion they contain encrypted data.

The actual math behind this is interesting. TrueCrypt volume files have file sizes that are evenly divisible by 512 and their content passes chi-square randomness tests. A chi-square test is any statistical hypothesis test in which the sampling distribution of the test statistic is a chi-square distribution* when the null hypothesis is true, or any in which this is asymptotically true. Specifically meaning that the sampling distribution (if the null hypothesis is true) can be made to approximate a chi-square distribution as closely as desired by making the sample size large enough.

So what does this all mean? Really nothing for us normal people. For those whom I have built custom STSADM containers for securing your backups and exports, your data is still secure and will stay that way indefinitely. For those running across the border. A forensic analysis will reveal the presence of encrypted data, TrueCrypt volumes or otherwise, but not much more. Sometimes that’s enough to start asking questions or poking further. With the forensic tools, not the dentistry kit.

* A skewed distribution whose shape depends on the number of degrees of freedom. As the number of degrees of freedom increases, the distribution becomes more symmetrical.

http://www.truecrypt.org/
http://16systems.com/TCHunt/

Nmap 5 released

Network security starts with scanning because you need to know what you have so that you can identify your vulnerable points and manage the associated risk.Nmap excels in helping you enumerate your network and identify what is running. Nmap is also a key tool in the fight against Conficker and its ilk and can be used to detect an infected node on a network.

With the release of Nmap 5, the first major release since 1997?, there is a noticeable speed advantage with faster scans. Aside from the speed improvements there are the new tools such as Ncat and Nmap Scripting Engine (NSE) that make Nmap 5 a must have.

  • “The new Ncat tool aims to be your Swiss Army Knife for data transfer, redirection, and debugging,” the Nmap 5.0 release announcement states.
  • NSE is all about automating network scanning task with scripts.”Those scripts are then executed in parallel with the speed and efficiency you expect from Nmap. All existing scripts have been improved, and 32 new ones added. New scripts include a whole bunch of MSRPC/NetBIOS attacks, queries, and vulnerability probes; open proxy detection; whois and AS number lookup queries; brute force attack scripts against the SNMP and POP3 protocols; and many more.”

Other “stuff” in this version…

  • ncat (allows data transfer, redirection and debugging) – (Remember hobbit’s nc ?)
  • ndiff scan comparison
  • better performance
  • improved zenmap GUI (including a real neat feature to visually map the network you have scanned)
  • Improvement of nmap scripting engine (nse), reviewed existing scripts and added 32 new scripts.

A useful if not must have tool. It not only applies to security, but also to simple things such as trying to find that pesky administrative interface to a WSS or MOSS environment when you cannot get access to the desktop… The more you have and know the better your options as they say.

http://nmap.org/5/
http://nmap.org/5/#5changes

dd: clean your drive securely

Now like anybody I’m a BIG fan of wiping old drives using dd but sometimes there’s a tool out there that will do most if not all of the work for you. Cue DBAN. OR as the site says:

Darik’s Boot and Nuke (“DBAN”) is a self-contained boot disk that securely wipes the hard disks of most computers. DBAN will automatically and completely delete the contents of any hard disk that it can detect, which makes it an appropriate utility for bulk or emergency data destruction.

Complemented with TrueCrypt you will have a mighty secure setup. Possible / definite paranoia issues too… But your data will be secure. For the more command line orientated the old reliable dd if=/dev/urandom of=/dev/disk bs=1k is good enough imho. (It puts random bits in place as opposed to a regular pattern. Not that it will stand up to NSA level scrutiny but it’s more than enough for most data recovery…)

For more go to:

DBAN: http://www.dban.org/
TrueCrypt: http://www.truecrypt.org/