<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>SharePoint.Sharon &#187; taxonomy</title>
	<atom:link href="http://www.sharepointsharon.com/tag/taxonomy/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.sharepointsharon.com</link>
	<description>news and tips about SharePoint and friends</description>
	<lastBuildDate>Thu, 26 Aug 2010 16:03:03 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Classifying content in SharePoint</title>
		<link>http://www.sharepointsharon.com/2008/06/classifying-content-in-sharepoint/</link>
		<comments>http://www.sharepointsharon.com/2008/06/classifying-content-in-sharepoint/#comments</comments>
		<pubDate>Tue, 03 Jun 2008 08:27:00 +0000</pubDate>
		<dc:creator>Sharon Richardson</dc:creator>
				<category><![CDATA[articles]]></category>
		<category><![CDATA[planning]]></category>
		<category><![CDATA[columns]]></category>
		<category><![CDATA[taxonomy]]></category>

		<guid isPermaLink="false">http://www.sharepointsharon.com/2008/06/classifying-content-in-sharepoint/</guid>
		<description><![CDATA[This is a follow on post to Taxonomy in MOSS (SharePoint Server 2007). Not everyone knows that you can manage columns at the site collection level and then re-use them across libraries. Also, whilst SharePoint doesn&#8217;t really do taxonomy management, you can use lists to perform some lightweight management, such as controlling what values are [...]]]></description>
			<content:encoded><![CDATA[<p>This is a follow on post to <a href="http://www.joiningdots.net/blog/2008/05/taxonomy-in-moss.html">Taxonomy in MOSS</a> (SharePoint Server 2007). Not everyone knows that you can manage columns at the site collection level and then re-use them across libraries. Also, whilst SharePoint doesn&#8217;t really do taxonomy management, you can use lists to perform some lightweight management, such as controlling what values are used within metadata columns. Here&#8217;s how to set it all up. Note: if you have multiple site collections, you would need to repeat this process for each site collection. You will need to be a Site Administrator to perform the following steps.</p>
<p>Scenario: We want all documents created or uploaded into any library within the site collection to be classified by Business Unit. To ensure the data entered is consistent, users will be presented with a menu from which to choose the appropriate Business Unit. The menu will be populated with values stored in a SharePoint list. Whenever the SharePoint list is updated (i.e. to add/remove/rename business units), the menu will automatically display the changes. The image below shows the basic architecture:</p>
<p align="center"><img alt="Taxonomy in MOSS" src="http://www.joiningdots.net/blog/uploaded_images/classifymoss-arch2.jpg" height="405" width="498" /></p>
<p align="center"><em>Image 1: outline architecture</em></p>
<p><strong>Step 1: Create your taxonomy lists. </strong></p>
<p>Because we are going to create a column that looks up values held in a SharePoint list, first we need to create the SharePoint list. You need to create the list(s) in the top-level site of the site collection (Joining Dots in this example).In the image below, I have created a list called &#8216;Business Unit&#8217;. If I want to add an item to the list, I simply click New and enter the title of another business unit:</p>
<p align="center"><img alt="A SharePoint lsit" src="http://www.joiningdots.net/blog/uploaded_images/classifymoss-list1.jpg" height="284" width="274" /></p>
<p align="center"><em>Image 2: A SharePoint List</em></p>
<p><strong>Step 2: Create the site column.</strong></p>
<p>The next step is to create the site column that will look up the values in the SharePoint list. Click the Site Actions button at the top-right of the SharePoint page and click &#8216;Site Settings&#8217;. (Hint: If you can&#8217;t see the Site Actions button, you don&#8217;t have the required permissions &#8211; you need to be a Site Administrator). It is important to navigate to the top-level site. On the Site Settings page, view the options under &#8216;Site Collection Administration&#8217; (circled in red in the image below). If you don&#8217;t see the list of options, you should see a single link &#8216;Go to top level site settings&#8217;. Click on it.</p>
<p align="center"><img alt="SharePoint Site Collection Administration" src="http://www.joiningdots.net/blog/uploaded_images/classifymoss-sitesettings1.jpg" height="241" width="499" /></p>
<p align="center"><em>Image 3: SharePoint Site Collection Administration</em></p>
<p>Assuming you are at the top level site, under Galleries, click Site columns. You will be presented with a list of the existing site columns. Click &#8216;Create&#8217; to create a new one and you will be presented with a page like the one below (the red arrows are mine):</p>
<p align="center"><img alt="Create a SharePoint Site Column" src="http://www.joiningdots.net/blog/uploaded_images/classifymoss-sitecolumn.jpg" height="585" width="499" /></p>
<p align="center"><em>Image 4: Create a SharePoint Site Column</em></p>
<ul>
<li>Give the column a name (in this example, &#8216;Business Unit&#8217;). </li>
<li>For the type of column, select &#8216;Lookup (information already on this site)&#8217;</li>
<li>Under Group, for the first time, select New group and give it a name (in this example, &#8216;Our Taxonomy). After that, use the same group. Makes it easy to locate your taxonomy columns</li>
<li>Under Additional Column Settings, choose if you want the column to be mandatory or not (&#8216;Require that the column contains information&#8217;) and configure the look-up:
<ul>
<li>Under &#8216;Get information from:&#8217;, select the SharePoint list (in this example, &#8216;Business Unit&#8217;). </li>
<li>Under &#8216;In this column:&#8217;, select the column within the list that contains the values you want to use in this column. (In this example, it is &#8216;Title&#8217;. You can see the column label on display in image 2)</li>
</ul>
</li>
<li>Click OK to create the column</li>
</ul>
<p><strong>Step 3: Configure a document library to use the site column</strong></p>
<p>Navigate to a document library where you want to use this site column. In this example, we have a sub-site called &#8216;Library&#8217; containing a document library called &#8216;Documents&#8217; (yes, in hindsight, I could have used better names to avoid confusion).</p>
<p align="center"><img alt="A SharePoint Document Library" src="http://www.joiningdots.net/blog/uploaded_images/classifymoss-doclibrary.jpg" height="185" width="499" /></p>
<p align="center"><em>Image 5: A SharePoint document library</em></p>
<p>In the document library, click the Settings button and do NOT choose the obvious option of &#8216;Create Column&#8217;. Instead, select &#8216;Document Library Settings and you will be presented with the screen shown below:</p>
<p align="center"><img alt="SharePoint Document Library Settings" src="http://www.joiningdots.net/blog/uploaded_images/classifymoss-doclibsettings.jpg" height="410" width="490" /></p>
<p align="center"><em>Image 6: SharePoint Document Library Settings</em></p>
<p>The clue is circled in red again. Click &#8216;Add from existing site columns&#8217; and you will be presented with the following screen:</p>
<p align="center"><img alt="SharePoint Document Library Column" src="http://www.joiningdots.net/blog/uploaded_images/classifymoss-doclibcolumn.jpg" height="306" width="477" /></p>
<p align="center"><em>Image 7: Add existing site column</em></p>
<p>SharePoint has a lot of built-in columns and groups. That is why it helps to use your own group names to organise your own site columns. In this example, the group is called &#8216;Our Taxonomy&#8217; and that filters the available site columns to the one and only &#8216;Business Unit&#8217;. Select the column and click Add. Make sure the &#8216;Add to default view&#8217; check box is selected and click OK.</p>
<p>Back in the document library itself, this time click Upload to add a document to the library:</p>
<p align="center"><img alt="A SharePoint Document Library" src="http://www.joiningdots.net/blog/uploaded_images/classifymoss-doclibrary2.jpg" height="194" width="339" /></p>
<p align="center"><em>Image 8: Upload a document</em></p>
<p>After selecting your document and clicking OK, you will be presented with the form to update any properties (values to be entered into columns):</p>
<p align="center"><img alt="Classify a document" src="http://www.joiningdots.net/blog/uploaded_images/classifymoss-uploaddoc.jpg" height="255" width="472" /></p>
<p align="center"><em>Image 9: Classify the document</em></p>
<p>As shown in image 9, the user is presented with a dropdown menu for Business Unit. The list of values in the menu come from the SharePoint list created in step 1. Hey presto. We&#8217;re done!</p>
<p>Now, it just wouldn&#8217;t be natural to write a SharePoint post without highlighting at least one gotcha to watch out for&#8230; Here are a couple of limitations to be aware of::</p>
<ol>
<li>Each time you classify a document, you select and insert a value from the menu. If you look at image 9, there are two similar business units &#8211; Accounts and Finance. If we decide that we do not need Accounts, we can delete it from the Business Unit list and it will automatically disappear from the menu, across every document library that references the Business Unit list. However, any documents that have already been classified as &#8216;Accounts&#8217; will still show that value in their properties, even though it is no longer available.</li>
<li>This approach is the best way to ensure consistency in your columns across your sites. However, there is no way to prevent people from creating their own columns at the document library level (see image 5), beyond restricting permissions to prevent access to all document library settings and/or providing good user training.</li>
</ol>
<p><strong>Filed in library under</strong>: <a href="http://www.joiningdots.net/library/Elements/Microsoft/sharepoint.html">SharePoint</a></p>
<p><span style="font-size:85%;"><strong>Technorati tags:</strong> </span><a href="http://www.technorati.com/tag/sharepoint"><span style="font-size:85%;">SharePoint</span></a><span style="font-size:85%;">; </span><a href="http://www.technorati.com/tag/sharepoint+2007"><span style="font-size:85%;">SharePoint 2007</span></a><span style="font-size:85%;">; </span><a href="http://www.technorati.com/tag/moss+2007"><span style="font-size:85%;">MOSS 2007</span></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.sharepointsharon.com/2008/06/classifying-content-in-sharepoint/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Taxonomy in MOSS</title>
		<link>http://www.sharepointsharon.com/2008/05/taxonomy-in-moss/</link>
		<comments>http://www.sharepointsharon.com/2008/05/taxonomy-in-moss/#comments</comments>
		<pubDate>Thu, 22 May 2008 13:00:00 +0000</pubDate>
		<dc:creator>Sharon Richardson</dc:creator>
				<category><![CDATA[articles]]></category>
		<category><![CDATA[events]]></category>
		<category><![CDATA[planning]]></category>
		<category><![CDATA[taxonomy]]></category>

		<guid isPermaLink="false">http://www.sharepointsharon.com/2008/05/taxonomy-in-moss/</guid>
		<description><![CDATA[On 21st May 2008 I presented to an audience of taxonomy professionals within the UK public sector. The last session of the day, I had 30 minutes to present on &#8220;Taxonomy within Microsoft Office SharePoint Server 2007 (MOSS): Lessons learned from real-world deployments&#8221;. My aim was to briefly explain what MOSS can and cannot do [...]]]></description>
			<content:encoded><![CDATA[<p>On 21st May 2008 I presented to an audience of taxonomy professionals within the UK public sector. The last session of the day, I had 30 minutes to present on &#8220;Taxonomy within Microsoft Office SharePoint Server 2007 (MOSS): Lessons learned from real-world deployments&#8221;. </p>
<p>My aim was to briefly explain what MOSS can and cannot do with taxonomy and provide a few tips on how to leverage MOSS taxonomy features to improve information findability. The session generated quite a bit of note-taking and debate. Here are the slides:</p>
<p align="center"><iframe src="http://docs.google.com/EmbedSlideshow?docid=df7tc7w_1dbth3fhb" frameborder="0" width="410" height="342"></iframe></p>
<p>Key messages from the presentation:</p>
<ul>
<li>MOSS uses elements of taxonomy to improve search and navigation. The core feature is &#8216;columns&#8217;, used for metadata. Case study: a tag-driven user interface created for the New Zealand Ministry of Transport. A great end result but a lot of effort required to implement and maintain</li>
<li>MOSS does not (yet) provide taxonomy management tools. Taxonomy management is about defining and managing schema(s), and classifying content agains those schemas</li>
<li>Taxonomy is not the holy grail. Schemas need to continually evolve to be effective. Often there is a disconnect between the language used by those creating the schema and those looking for information that the schema is for. This perhaps explains why folksonomies have achieved more success than official taxonomies, but&#8230; </li>
<li>User tagging is less accurate or consistent than automatic classification. Comment from Google founder Sergey Brin: Semantics and tagging are great as long as computers are doing it [not people].&#8221; Automatic classification is by no means perfect either. Accuracy rarely exceeds 70% &#8211; lots of development going on to improve this</li>
<li>4 tips to improve the use and value of taxonomy within MOSS today: </li>
<ol>
<li>Where possible, define columns at the site collection level, not per library. Do it per library and each instance will be treated as separate crawled property in the index. By managing per site collection, you can also control what values can be entered, improving consistency across sites and libraries</li>
<li>Avoid using sites and sub-sites to mimic file structures (popular when creating file plans). One of the relevance algorithms is URL depth. The deeper the URL, the less relevant and you don&#8217;t want empty sites returned in search results. Alternative approach: create a link-driven UI to mimic the file plan but apply it using columns and store content in as few sites as possible</li>
<li>Check out your sources. When indexing content, if one source has a lot more metadata than others, it can dominate search results. A common issue for mergers and acquistions, or re-orgs within government. Solution: split the index and/or use federated search</li>
<li>Maximise the effectiveness of automatic metadata, such as titles and descriptions. Avoid bland document titles (e.g. &#8216;meeting notes&#8217; x 50) and irrelevant link titles (&#8216;Click here&#8217; versus a title that describes where the link takes you)</li>
</ol>
<li>Most likely scenarios to want to go beyond MOSS are concept-driven search and automatic classification. You can use bespoke code and lightweight tools like the Faceted Search tool on Codeplex. But it is usually better to engage a partner.</li>
<li>Final case study: legal firm &#8211; lots of taxonomy but just getting search up and running was a big win. People found it easier to find information using basic search than the formal navigation structures created by file plans&#8230;</li>
</ul>
<p>To download a copy of the presentation: <a href="http://www.joiningdots.net/downloads/mosstips-may08.pdf">MOSSTIPS-May08.pdf</a> (2.7Mb) <--Note I haven't used 'Click here' :-)</p>
<p><span style="font-size:85%;"><strong>Technorati tags:</strong> </span><a href="http://www.technorati.com/tag/moss+2007"><span style="font-size:85%;">MOSS 2007</span></a><span style="font-size:85%;"> </span><a href="http://www.technorati.com/tag/sharepoint"><span style="font-size:85%;">SharePoint</span></a><span style="font-size:85%;"> </span><a href="http://www.technorati.com/tag/taxonomy"><span style="font-size:85%;">Taxonomy</span></a><span style="font-size:85%;"> </span><a href="http://www.technorati.com/tag/tagging"><span style="font-size:85%;">Tagging</span></a><span style="font-size:85%;"> </span><a href="http://www.technorati.com/tag/information+architecture"><span style="font-size:85%;">Information Architecture</span></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.sharepointsharon.com/2008/05/taxonomy-in-moss/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Rethinking the fileplan</title>
		<link>http://www.sharepointsharon.com/2008/03/rethinking-the-fileplan/</link>
		<comments>http://www.sharepointsharon.com/2008/03/rethinking-the-fileplan/#comments</comments>
		<pubDate>Thu, 27 Mar 2008 12:30:00 +0000</pubDate>
		<dc:creator>Sharon Richardson</dc:creator>
				<category><![CDATA[articles]]></category>
		<category><![CDATA[planning]]></category>
		<category><![CDATA[IA]]></category>
		<category><![CDATA[taxonomy]]></category>

		<guid isPermaLink="false">http://www.sharepointsharon.com/2008/03/rethinking-the-fileplan/</guid>
		<description><![CDATA[Perhaps one of the loudest unspoken messages from the SharePoint conference held recently in Seattle was the need for information architects and managers to work more closely with their user interface (UI) and technology-focused counterparts. Thanks to the Internet, we are unlikely to see a downturn in the market for digital information in the foreseeable [...]]]></description>
			<content:encoded><![CDATA[<p>Perhaps one of the loudest unspoken messages from the SharePoint conference held recently in Seattle was the need for information architects and managers to work more closely with their user interface (UI) and technology-focused counterparts. Thanks to the Internet, we are unlikely to see a downturn in the market for digital information in the foreseeable future. But the methods used to classify, manage and access information are still dominated by techniques taken from the physical world of information &#8211; paper and its storage methods: micro (books) and macro (libraries).</p>
<p>Let&#8217;s pick on &#8216;The Fileplan&#8217;</p>
<p>A common scenario I see in organisations, especially government ones, is the use of a fileplan to store and access content. Here&#8217;s the definition of a fileplan, courtesy of &#8216;<a href="http://dlmforum.typepad.com/Developing_a_Fileplan_for_Local20Government.pdf">Developing a Fileplan for Local Government</a>&#8216; (UK) (My comments in brackets):</p>
<blockquote><p>¨The fileplan will be a hierarchical structure of classes starting with a number of broad functional categories. These categories will be sub-divided and perhaps divided again until folders are created at the lowest level. These folders, confusingly called files in paper record management systems (hence the term &#8216;fileplan&#8217;), are the containers in which either paper records or electronic documents are stored.¨</p>
</blockquote>
<p>And why do we need fileplans</p>
<blockquote><p>¨An important purpose of the fileplan is to link the documents and records to an appropriate retention schedule.¨</p>
</blockquote>
<p>Really? Just how many different retention schedules does an organisation need to have? One per lowest-level folder? I doubt that. Let&#8217;s create a (very) simple fileplan: Geography &#8211; Business Unit &#8211; Activity</p>
<p>Taking 3 geographies, 3 business units and 3 activities. These are the folders you end up with:</p>
<ul>
<li>UK/finance/budget/</li>
<li>UK/finance/managementaccounts/</li>
<li>UK/finance/projects/</li>
<li>UK/IT/operations/</li>
<li>UK/IT/procedures/</li>
<li>UK/IT/projects/</li>
<li>UK/Sales/campaigns</li>
<li>UK/Sales/products</li>
<li>UK/Sales/projects</li>
<li>France/finance/budget/</li>
<li>France/finance/managementaccounts/</li>
<li>France/finance/projects/</li>
<li>France/IT/operations/</li>
<li>France/IT/procedures/</li>
<li>France/IT/projects/</li>
<li>France/Sales/campaigns/</li>
<li>France/Sales/products/</li>
<li>France/Sales/projects/</li>
<li>Germany/finance/budget/</li>
<li>Germany/finance/managementaccounts/</li>
<li>Germany/finance/projects/</li>
<li>Germany/IT/operations/</li>
<li>Germany/IT/procedures/</li>
<li>Germany/IT/projects/</li>
<li>Germany/Sales/campaigns</li>
<li>Germany/Sales/products</li>
<li>Germany/Sales/projects</li>
</ul>
<p>So we have 27 different locations to cover 3 geographies with 3 departments and 3 activities. Now scale this up for your organisation. How many different folders do you end up with?</p>
<p>The ultimate killer with this scenario? There isn&#8217;t any content in the first 2 levels of the hierarchy. You&#8217;ve got to navigate through 3 levels before you can even start to find what you are looking for. This is because a librarian approach is used for storing and locating information: </p>
<blockquote><p>Go upstairs, &#8216;Technology&#8217; section is on the left, you&#8217;ll find &#8216;Computing&#8217; about halfway along. Third shelf up is &#8216;Programming Languages&#8217;, books organised alphabetically by author&#8230; </p></blockquote>
<p>In the physical world, we can&#8217;t do a &#8216;<a href="http://en.wikipedia.org/wiki/Beam_Me_Up%2C_Scotty">Beam me up, Scotty!</a>&#8216; and magically arrive at the shelf containing the book containing the page(s) we want. But in the digital world, we can. If fans of the fileplan designed Google&#8217;s navigation, it might look something like this:</p>
<p><a href="http://www.joiningdots.net/blog/uploaded_images/ia1-739344.jpg"><img style="DISPLAY: block; MARGIN: 0px auto 10px; CURSOR: hand; TEXT-ALIGN: center" alt="" src="http://www.joiningdots.net/blog/uploaded_images/ia1-739337.jpg" border="0" /></a></p>
<p>And they probably wouldn&#8217;t include the search box on the first two pages. Fortunately for everyone who uses the Internet to search for information, Google took the &#8216;Beam me up, Scotty!&#8217; approach.</p>
<p>The fileplan approach causes problems for everyone. Authors have to find &#8216;the right&#8217; location to put their stuff. If they are working on anything remotely ambiguous, it is unlikely there will be one clear option. That&#8217;s why everyone ends up defaulting to the &#8216;projects&#8217; folder (&#8216;miscellaneous&#8217; is another popular destination). Search engines that use URL depth algorithms (such as PageRank) struggle to identify relevant content &#8211; is the folder &#8216;Finance&#8217; more important than a document called &#8216;Finance&#8217; that is two levels deeper in the hierarchy buried under Projects/Miscellaneous? If someone is searching for documents about France, are documents located in the France folder hierarchy more important than documents containing references to France that have been stored in the UK hierarchy? Authors don&#8217;t know where to put their stuff, and searchers can&#8217;t find it. What about those all important retention schedules? They might be different for different geographies (governments don&#8217;t seem to agree or standardise on anything much, globally) but then what? Do all of Finance docs have a different retention schedule to all of IT? Within Finance, do different teams have different retention schedules? (Quite possibly &#8211; certain financial documents need storing for specific periods of time). Current solution? Sub-divide and conquer, i.e. create yet another level of abstraction in the fileplan&#8230; I have seen solutions where users have to navigate through 6 levels before reaching a folder that contains any content.</p>
<p>So what&#8217;s the alternative?</p>
<p>Perhaps a better question would be &#8216;what&#8217;s <em>an</em> alternative?&#8217; The desire to find one optimal solution is what trips up most information system designs. Here are some of my emerging thoughts. If you&#8217;ve got an opinion, please contribute in the comments because I certainly don&#8217;t have all the answers.</p>
<p><strong>Step 1: Stop thinking physically and start thinking digitally</strong></p>
<p>There are two fundamental problems with the fileplan. First, it originates from the constraints enforced by physical technologies. A paper document must exist somewhere and you don&#8217;t want to have to create 100 copies to cover all retrieval possibilities &#8211; it&#8217;s expensive and time-consuming. Instead, all roads lead to one location&#8230; and it&#8217;s upstairs, third cabinet on the right, second drawer down, filed by case title. This approach creates the second problem &#8211; because content is managed in one place, that one place &#8211; the fileplan &#8211; must cover all purposes, i.e. storage, updates, retention schedule, findability and access. Physical limits required you to think this way. But those limits are removed when you switch to digital methods. What we need are multiple file plans, each suited to a specific purpose.</p>
<p>Information specialists can help identify the different purposes and different &#8216;file plans&#8217; required. Technologists need to help create solutions that make it as easy as possible (i.e. minimal effort required) for authors and searchers to work with information and &#8216;fileplans&#8217;. And user interface specialists need to remind everyone about what happens when you create mandatory metadata fields and put the search box in the wrong place on the page&#8230;</p>
<p>Digital storage of content should be logical to the creators, because authors ultimately decide where they save their documents. Trying to force them into a rigid navigation hierarchy designed by somebody else just means everything gets saved in &#8216;miscellaneous&#8217;. Don&#8217;t aim for a perfect solution. Instead, provide guidance about where &#8216;stuff&#8217; should go. Areas for personal &#8216;stuff&#8217;, team &#8216;stuff&#8217;, community sites, collaborative work spaces, &#8216;best practices&#8217; sites. Ideally, you still want to stick to one location. Not because of any resource constraints but rather to avoid unnecessary duplication that can cause confusion. If an item of content needs to appear &#8216;somewhere else&#8217; then it should be a link rather than a copy, unless a copy is required to fit a different scenario (e.g. publishing a copy of a case study up onto a public web site, but keeping the original held in a location that can only be edited by authors)</p>
<p>To improve relevance of search results, thesauri and controlled vocabularies can help bridge the language barriers between authors and readers. A new starter might be looking for the &#8216;employee manual&#8217;. What they don&#8217;t know is what they are actually looking for is the &#8216;corporate handbook&#8217; or &#8216;human remains guide&#8217; that may contain the words &#8216;employee&#8217; and &#8216;manual&#8217; but never together in the same sentence. The majority of search frustrations come from information seekers using a different language to the one used by the authors of the information they seek. Creating relationships between different terms can dramatically improve relevance of search results. Creating tailored results pages (a mix of organic search results and manufactured links) can overcome differences in terminology and improve future search behaviour.</p>
<p>And the elephant in the file system &#8211; retention schedules. First identify what retention schedules you require to comply with industry regulations and to manage legal risk. And do they apply to all content or only certain content? (I doubt many government organisations have kept junk paper mail for 30 years.) And at what point do they need to be applied? From the minute somebody opens a word processor tool and starts typing, or at the point when a document becomes finalised? This is the area that needs most coordination between information specialists and technologists. As we start to move to XML file formats, life could potentially become so much easier for everyone. For example, running scripts to automatically track documents for certain words that give a high probability that the document should be treated as a record and moved from a community discussion forum to the archive. Automatically inserting codes that enable rapid retrieval of content to comply with a legal request but that have no effect on relevance for regular searches. </p>
<p>On the Internet, Google introduced a tag &#8216;nofollow&#8217; that could be applied to links to prevent the link improving a page&#8217;s relevance rank. (PageRank works by determining relevance based on the number of incoming links to a page. If you want to link to a page so that people can look at it but you don&#8217;t want the page to benefit from the link in search results, you can insert &#8216;nofollow&#8217;). Maybe Enterprise Search solutions need a similar method. Different indicators for metadata that helps describe content for searches versus metadata that organises content for retention schedules versus metadata that helps authors remember where they left their stuff. And again, XML formats ought to make it possible to automatically insert the appropriate values without requiring the author to figure out what&#8217;s needed. The ultimate goal would be to automatically insert sufficient information within individual content items so that requirements are met regardless of where the content is stored or moved to. I email an image to someone and its embedded metadata includes its fileplan(s).</p>
<p>There are lots of ways that technology could be used to improve information management and findability, to meet all the different scenarios demanded by different requirements. But to achieve them requires closer interaction between people making the policies regarding how information is managed, people creating the so-called &#8216;technology-agnostic&#8217; (in reality it is &#8216;technology-vendor-agnostic&#8217;) file plans to satisfy those policies and the technology vendors creating solutions used to create, store and access the content being created that have to cope with the fileplans and the policies.</p>
<p>The information industry has to move on from the library view of there being only one fileplan. Lessons can be learned from the food industry. There was a time when there was only one type of spaghetti sauce. In the TED talk below, Malcolm Gladwell explains how the food industry discovered the benefits from offering many different types of spaghetti sauce (and why you can&#8217;t rely on focus groups to tell you what they want &#8211; another dilemma when designing information systems):</p>
<p align="center"><object id="VE_Player" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=" height="285" width="320" align="middle" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000"><param name="_cx" value="8467"><param name="_cy" value="7541"><param name="FlashVars" value=""><param name="Movie" value="http://static.videoegg.com/ted/flash/loader.swf"><param name="Src" value="http://static.videoegg.com/ted/flash/loader.swf"><param name="WMode" value="Window"><param name="Play" value="0"><param name="Loop" value="-1"><param name="Quality" value="High"><param name="SAlign" value="LT"><param name="Menu" value="-1"><param name="Base" value=""><param name="AllowScriptAccess" value="always"><param name="Scale" value="NoScale"><param name="DeviceFont" value="0"><param name="EmbedMovie" value="0"><param name="BGColor" value="FFFFFF"><param name="SWRemote" value=""><param name="MovieData" value=""><param name="SeamlessTabbing" value="1"><param name="Profile" value="0"><param name="ProfileAddress" value=""><param name="ProfilePort" value="0"><param name="AllowNetworking" value="all"><param name="AllowFullScreen" value="false"><embed src="http://static.videoegg.com/ted/flash/loader.swf" flashvars="bgColor=FFFFFF&#038;file=http://static.videoegg.com/ted/movies/MALCOLMGLADWELL_high.flv&#038;autoPlay=false&#038;fullscreenURL=http://static.videoegg.com/ted/flash/fullscreen.html&#038;forcePlay=false&#038;logo=&#038;allowFullscreen=true" quality="high" allowscriptaccess="always" bgcolor="#FFFFFF" scale="noscale" wmode="window" width="320" height="285" name="VE_Player" align="middle" type="application/x-shockwave-flash" pluginspage="http://www.macromedia.com/go/getflashplayer"></embed></object></p>
<p align="center"><a href="http://www.ted.com/index.php/talks/view/id/20">Direct link to TED talk (in case video doesn&#8217;t load here)</a></p>
<p>There is a great quote within the above talk:</p>
<blockquote><p>¨When we pursue universal principles in food, we aren&#8217;t just making an error, we are actually doing ourselves a massive disservice¨</p></blockquote>
<p>You could replace the word &#8216;food&#8217; with &#8216;information&#8217;. It&#8217;s not just the fileplan that needs rethinking&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sharepointsharon.com/2008/03/rethinking-the-fileplan/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Just Enough Taxonomy</title>
		<link>http://www.sharepointsharon.com/2007/05/just-enough-taxonomy/</link>
		<comments>http://www.sharepointsharon.com/2007/05/just-enough-taxonomy/#comments</comments>
		<pubDate>Tue, 08 May 2007 14:07:00 +0000</pubDate>
		<dc:creator>Sharon Richardson</dc:creator>
				<category><![CDATA[articles]]></category>
		<category><![CDATA[planning]]></category>
		<category><![CDATA[taxonomy]]></category>

		<guid isPermaLink="false">http://www.sharepointsharon.com/2007/05/just-enough-taxonomy/</guid>
		<description><![CDATA[On Microsoft&#8217;s Channel 9 network, there is an interesting podcast called &#8216;Just Enough Architecture&#8216;, where the interviewee provides some good recommendations about the balance between how much architecture you need versus just getting on and writing software that does something useful. The same debate could be applied to taxonomy, specifically the use of metadata properties [...]]]></description>
			<content:encoded><![CDATA[<p>On Microsoft&#8217;s <a href="http://channel9.msdn.com/">Channel 9</a> network, there is an interesting podcast called &#8216;<a href="http://channel9.msdn.com/Showpost.aspx?postid=241305">Just Enough Architecture</a>&#8216;, where the interviewee provides some good recommendations about the balance between how much architecture you need versus just getting on and writing software that does something useful.</p>
<p>The same debate could be applied to taxonomy, specifically the use of metadata properties to classify content.</p>
<p>For some reason, most companies who decide they want to improve how content is classified seem to want extreme taxonomy, swinging from not-enough taxonomy to too-much. The mantra may sound somewhat familiar:</p>
<blockquote><p>One taxonomy to rule them all, one taxonomy to find them, one taxonomy to bring them all and, in the records management store, define them</p>
</blockquote>
<p>Often starting with none at all (i.e. content is organised informally and inconsistently using folders), the desire is to create a single corporate taxonomy to classify everything (using a hierarchical structure of metadata terms). An inordinate amount of time is then spent defining and agreeing the perfect taxonomy (for some reason, many seem to settle on about 10,000 terms). Several months later, heads are being scratched as people try to figure out just how they are going to implement the taxonomy. Do they classify existing content or only apply it to new stuff? Do they have specific roles dedicated to classifying the content, rely on the content owners to do it, or look at automated classification tools. Do they put rules in place to force people to classify content and store it in specific locations that are &#8216;taxonomy-aware&#8217;. How do they prevent people bypassing the system, those who figure they can still get their work done by switching to a wiki or a <a href="http://office.microsoft.com/en-gb/groove/default.aspx">Groove</a> workspace or a <a href="http://www.myspace.com/">MySpace</a> site or a <a href="http://twitter.com/">Twitter</a> conversation? How do they validate the taxonomy and check that people are classifying correctly? What do they do if people aren&#8217;t classifying correctly, who don&#8217;t understand the hierarchy or have different meanings for the terms in use? What started out as a simple idea to improve the findability of information becomes a huge burden to maintain with questionable benefits, given there are so many opportunities for classification to go wrong.</p>
<p>This dilemma reveals two flaws that make implementing a taxonomy so difficult. The first is the desire to treat taxonomy as a discrete project rather than an organic one. Collaboration and knowledge management projects often share this fate. Making taxonomy a discrete project usually means tackling it all in one go from a technology perspective and then handing it over to the business to run &#8216;as is&#8217; for ever more (i.e. until the next technology upgrade). Such projects end up looking like that old cliché &#8211; attempting to eat an elephant whole. The project team tries to create a perfect design that will deliver all identified requirements (and the business, knowing this could be their one chance for improved tools, delivers a loooooong list of requirements), implements a solution and then moves on to the next project. As the solution is used, the business finds flaws in their requirements or discover new ways of working enabled by the technology, but it is too late to get the solution changed. The project is closed, the budget spent.</p>
<p>An alternative approach is to treat taxonomy as an organic project or, for those who prefer corporate-speak, a continuous-improvement programme. Instead of planning to create and deploy the perfect taxonomy, concentrate on &#8216;just enough taxonomy&#8217;. A good starting point is to find out why taxonomy is needed in the first place. If it is to make it easier for people to find information, first document the specific problems being experienced. Solve those problems as simply as possible, test them and gather feedback. If successful, people will raise the bar on what they consider good findability, generating new demands waiting for IT to solve, and so the cycle continues.</p>
<p>The following is a simple example using a fictitious company.</p>
<p>Current situation: Most information is stored in folders on file shares and shared via email. There is an intranet that is primarily static content published by a few authors. The IT department has been authorised to deploy <a href="http://www.microsoft.com/sharepoint/default.mspx">Microsoft Office SharePoint Server 2007</a> (MOSS)</p>
<p>General problem: Nobody can find what they are looking for (resist temptation to sing U2 song at this moment&#8230;)</p>
<p>Specific problems: Difficult to find information from recently completed projects that could be re-used in future projects; Difficult to differentiate between high quality re-usable project information versus low quality or irrelevant project information; Difficult to find all available documents for a specific customer (contracts, internal notes, project files)</p>
<p>Possible solution: Deploy a search engine to index all file folders and the intranet. Move all project information to a central location. Within the search engine, create a scope (or collection) for the project information location. Users will then be able to perform search queries that will return only project information within the results. Using &#8216;date modified&#8217; as the sorting order will locate information from the most recent projects. Create a central location for storing top-rated &#8216;best practice&#8217; project information. Set-up a team of subject matter experts to work with project teams and promote documents as &#8216;best practice&#8217;. The Best Practices store can be given high visibility throughout the intranet and promoted as high relevance for search queries.</p>
<p>Now that is a very brief answer outlining one possible solution. But the solution is relatively simple to implement and should offer immediate (and measurable) improvements based on feedback regarding the problems people are experiencing. There were two red herrings in the requirements that could have resulted in a very different, more complex, solution: 1. That MOSS was going to be the technology; and 2. The need to find documents for a specific customer. When you have chosen a technology, there is always the temptation to widen the project scope. MOSS has all sorts of features that can help improve information management and the starting point is often to replace an old crusty static intranet. But the highlighted problems did not mention any concerns about the intranet. That&#8217;s not to say those concerns do not exist, but they are a different problem and not the priority for this project. The second red herring is a classic. When people want to be able to find information based on certain parameters, such as all documents connected to a specific customer, there is the temptation to implement a corporate-wide taxonomy and start classifying all content, starting with the metadata property &#8216;customer name&#8217;. But documents about a specific customer will likely contain the customer&#8217;s name. In this scenario, the simplest solution is to create a central index and provide the ability for users to search for documents containing a given customer&#8217;s name. If that fails to improve the situation then you may need to consider more drastic measures.</p>
<p>Rejecting the large-scale information management project in favour of small chunks of continuous &#8216;just enough&#8217; improvement is not an easy approach to take. The idea of having a centralised, classified and managed store of content, where you can look up information based on any parameter and receive perfect results, continues to be an attractive one with lots of benefits to the business &#8211; both value-oriented (i.e. helping people discover information to do their job) and cost-oriented (i.e. managing what people do with information &#8211; compliance checks and the like). But a perfectly classified store of content is a <a href="http://en.wikipedia.org/wiki/Utopia">utopia</a>. Trying to achieve it can result in creating systems that are harder to use and difficult to maintain when the goal is supposed to be to make them easier.</p>
<p>I mentioned that the common approach to implementing taxonomy has two flaws. The first has been discussed here &#8211; how to create just enough taxonomy. The second flaw is the desire to create a single universal taxonomy that can be applied to everything. I&#8217;ll tackle that challenge in a separate post (a.k.a this post is already too long&#8230;)</p>
<p><strong>Reference:</strong> <a href="http://channel9.msdn.com/Showpost.aspx?postid=241305">Just Enough Architecture</a> (MSDN Channel 9). Highly recommended. There are plenty of similarities between software architecture and information architecture (of which taxonomy is subset). Don&#8217;t be put off by the techie speak, it debates the pro&#8217;s and con&#8217;s of formal processes and informal uses, and includes some great non-technical examples for how to find a balance.</p>
<p><strong>Recent related posts:</strong></p>
<ul>
<li><a href="http://www.joiningdots.net/blog/2007/05/metacrap.html">Metacrap</a></li>
<li><a href="http://www.joiningdots.net/blog/2007/04/why-taxonomy-fails.html">Why taxonomy fails</a></li>
<li><a href="http://www.joiningdots.net/blog/2007/03/when-taxonomy-fails.html">When taxonomy fails</a></li>
</ul>
<p>Technorati tags: <a href="http://www.technorati.com/tag/taxonomy">Taxonomy</a>, <a href="http://www.technorati.com/tag/tagging">Tagging</a>, <a href="http://www.technorati.com/tag/information+architecture">Information Architecture</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.sharepointsharon.com/2007/05/just-enough-taxonomy/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SharePoint and stemming</title>
		<link>http://www.sharepointsharon.com/2007/01/sharepoint-and-stemming/</link>
		<comments>http://www.sharepointsharon.com/2007/01/sharepoint-and-stemming/#comments</comments>
		<pubDate>Tue, 02 Jan 2007 18:50:00 +0000</pubDate>
		<dc:creator>Sharon Richardson</dc:creator>
				<category><![CDATA[articles]]></category>
		<category><![CDATA[install & config]]></category>
		<category><![CDATA[planning]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[taxonomy]]></category>

		<guid isPermaLink="false">http://www.sharepointsharon.com/2007/01/sharepoint-and-stemming/</guid>
		<description><![CDATA[Happy New Year! Sooo, this blog has been a little quieter than planned recently, due to other activities taking priority. But hopefully it will be back on track during January. A quick post for starters. I&#8217;ve always been more than a little bit interested in the search capabilities within SharePoint, ever since Microsoft introduced probabilistic [...]]]></description>
			<content:encoded><![CDATA[<p>Happy New Year! <img src='http://www.sharepointsharon.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>Sooo, this blog has been a little quieter than planned recently, due to other activities taking priority. But hopefully it will be back on track during January.</p>
<p>A quick post for starters.</p>
<p>I&#8217;ve always been more than a little bit interested in the search capabilities within SharePoint, ever since Microsoft introduced probabilistic ranking in SharePoint Portal Server 2001. I can still do a pretty mean explanation of how the Okapi algorithm ranks search results and compare it to PageRank.</p>
<p>Anyways, there is a useful Microsoft blog specialising in the search stuff &#8211; <a href="http://blogs.msdn.com/miketag/default.aspx">Mike Taghizadeh</a>. He&#8217;s just written a couple of articles on word stemming. Word stemming helps determine the documents returned when you enter a search query. Mike <a href="http://blogs.msdn.com/miketag/archive/2006/12/27/moss-search-word-stemhttp://blogs.msdn.com/miketag/archive/2006/12/27/moss-search-word-stemming-part-2.aspxming-part-2.aspx">talks all about it</a>, so here is the short version.</p>
<p>
<ul>
<li>When you submit a query in SharePoint, the query is broken into individual words. For example, the query &#8220;securing the database&#8221; would be broken down into &#8220;securing&#8221;, &#8220;the&#8221;, and &#8220;database&#8221;</li>
<li>Noise words can be eliminated, i.e. common words such as &#8220;and&#8221;, &#8220;the&#8221;, &#8220;or&#8221;, that are unlikely to influence results. In this example, &#8220;the&#8221; would be dropped from the query</li>
<li>The query words can then be stemmed for variations. For example, a query for &#8220;security&#8221; could be expanded to include documents that refer to &#8220;securing&#8221;, &#8220;securely&#8221; and so on</li>
<li>The query words will also be compared against the thesaurus. The thesaurus is customisable and very useful for words with domain-specific alternatives or abbreviations. You can choose to expand queries (e.g. expand &#8220;PMB&#8221; to also search for &#8220;Purple Medium Board&#8221;) or replace queries (e.g. replace &#8220;ie&#8221; with &#8220;Instant Everywhere&#8221; &#8211; &#8220;ie&#8221; will return just about every document in an English-language index).</li>
</ul>
<p>In his post, Mike mentions that word stemming is turned off by default. I&#8217;ve just checked on my demo laptop and he&#8217;s right. If you want to turn on word stemming, here&#8217;s how:</p>
<ul>
<li>Go to the search page, enter any old query to return the search results page</li>
<li>Under Site Actions, select &#8216;Edit page&#8217;</li>
<li>Locate the &#8216;Search Core Results&#8217; web part (usually in the bottom zone)</li>
<li>From the Edit button, select &#8216;Modify shared web part&#8217;</li>
<li>In the tool bar on the right hand side, under &#8216;Results Query Options&#8217;, check the box labeled &#8216;Enable Search Term Stemming&#8217;</li>
</ul>
<p>And hey presto, it&#8217;s switched on.</p>
<p>Now, before you go automatically enabling the feature, despite it seeming obvious to use it, be warned. Word stemming can affect the relevance of your search query. If some terms have lots of stemming and others have none, one word may now dominate results even if it isn&#8217;t the priority in the context of what you are looking for. Stemming can also negatively affect performance &#8211; there will be a delay whilst expanding the search query to include stemming, and a larger set of results will be returned.</p>
<p>Technorati tags: <a href="http://www.technorati.com/tag/sharepoint">SharePoint</a>, <a href="http://www.technorati.com/tag/sharepoint+2007">SharePoint 2007</a>, <a href="http://www.technorati.com/tag/moss+2007">MOSS 2007</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.sharepointsharon.com/2007/01/sharepoint-and-stemming/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
