Life Interesting Bit: BI

Showing posts with label BI. Show all posts

Friday, July 8, 2011

从SQL Server数据库管理员到数据仓库管理员

不管企业在Oracle还是SQL Server的平台上构建了数据仓库都只是一个开始，要令这个仓库正常运转还需要有另外一群人的不懈努力，这就是数据仓库管理员(DWA)。他们和数据库管理员 (DBA)很相似，但又有所区别。想知道数据仓库管理员都要担负起哪些职责和任务吗?想知道需要具备哪些技能和知识才能成为一名优秀的数据仓库管理员吗? 想知道数据库管理员和数据仓库管理员之间的关系和区别在哪里吗?下面为你一一道来。
数据仓库市场的火热也带动了国内很多企业对数据仓库应用的需求，特别是一些大型企业和机构，如保险业等。目前国内一些企业在数据仓库项目开发完成后并没有专门培训特定的数据仓库管理员，大都是从开发团队里挑选一些人留下来执行维护任务，因为即使在通常情况下，经过简单的指导一般的技术人员也没有能力维护这个复杂的系统，更不用说出现突发事件应该怎么处理了。数据仓库的管理是不少企业数据仓库应用的软肋。要保证数据仓库系统的稳定性、可用性和高效性必须要有具备专业素养的数据仓库管理员来完成。
数据仓库管理员(Data Warehouse Administrator)，如果取首英文字母简写为DWA，很多人会以为讲的是数据仓库架构师(Data Warehouse Architect)，不过本文的主角是数据仓库管理员，而且主要讲述的是活跃在SQL Server平台上的数据仓库管理员。
数据仓库管理员主要负责维护企业数据仓库的完整性和可用性，包括数据的质量问题，确保数据仓库的正常持续运行。数据仓库管理员要管理的也许是容量上到5TB级的高可用性SQL Server 2005数据仓库，而且有遍布全球的有几十家分公司好几百名用户将其应用于商业智能和客户关系管理;也许只是被某公司总部十几个用户用作销售、客户和产品分析的300GB单服务器数据仓库。不管数据仓库管理员需要管理的是哪一种数据库，其最重要工作就是维护。

数据仓库系统每天都要进行大量的ETL操作，按照特定的时间间隔把数据抽取整合到数据仓库里。这个时间间隔也许是每隔一天、每隔一个星期或每隔几个小时。DWA的其中一个主要任务就是监测这些ETL处理进程，确保其正常运作。监测ETL处理进程的任务非常重要，因为这个进程在不断地为数据仓库供给数据原料。如果ETL处理进程运行不当，数据仓库里的数据就会过时;如果ETL处理进程运行到一半就卡壳了，那么数据仓库里的数据就会不完整; 如果ETL处理进程运行出错，那么存入的数据也会不正确;而如果数据不正确不完整，那么根据这些数据而制定的所有决策都会受到影响。这就是为什么确保 ETL进程由始至终正常运作的重要原因。
数据仓库管理员最好是向数据仓库主管汇报工作，不过有时候他们会向数据仓库架构师汇报。数据仓库管理员的关键任务包括以下几个方面(假设在SQL Server平台上运行)：
· 监测每天(每星期)的ETL进程、数据转化服务工具包和SQL Server集成服务任务的运行
· 管理数据仓库的数据库，维护所有数据库服务器
· 管理分析服务立方体和服务器
· 管理报表服务和服务器(很可能是一个网络场)
· 管理数据挖掘模型和预测分析
· 管理数据仓库安全
· 制作数据仓库工作负荷和活动情况报表
· 向数据仓库批量上载新数据
· 安装补丁程序并执行更新升级
· 管理数据仓库端口
· 备份和检测还原所有数据仓库对象
· 与开发团队保持合作以部署代码
· 与业务团队保持联系以解决关于数据请求的问题
· 为终端用户组织培训班

· 帮助用户解决查询问题
数据仓库管理员需要具备的核心技术能力：
· 具有维护SQL Server数据库的经验
· 具备报表和分析服务的知识
· 充分了解数据仓库构建原理
· 熟悉维度建模
· 清楚如何管理SQL Server集成服务作业和数据转化服务工具包
· 最好具有MCDBA认证或者MCITP的BI认证
数据仓库管理员必须知道怎样对数据仓库进行性能调优，必须了解对维度数据存储的事实表调优和对联机事务处理系统的事务表调优之间的区别，必须清楚为什么简单恢复更适合于阶段和维度数据存储而完全恢复更适合于操作数据存储。
在一些企业里，数据仓库管理员还要负责维护报表和SQL Server集成服务。不过大多数情况下，这些任务都是由另外的数据仓库开发员负责的。如果企业使用商业智能和企业绩效管理工具，例如Business Objects、SAS、Cognos、Hyperion、MicroStrategy 和ProClarity等，很可能数据仓库管理员也要负责管理这类工具。还要企业会为报表服务端口配置SharePoint，所以SQL Server数据仓库管理员也要具备与Sharepoint相关的技能。
“数据仓库管理员”这个术语在Teradata、DB/2和Oracle的机构里的流行程度比在SQL Server机构里的更高。不过自从SQL Server 2005上市后，构建在微软平台上的数据仓库也越来越受欢迎了。我们期待明年二月底发布的SQL Server 2008在将来能够给SQL Server数据仓库管理员角色更多的用武之地。
现在我们已经了解了数据仓库管理员究竟是个什么角色，那么一个SQL Server数据库管理员怎样才能成功地转型为SQL Server数据仓库管理员呢?其实，SQL Server数据库是SQL Server数据仓库管理员的最佳候选人，比IT业内的其他职位更接近和符合SQL Server数据仓库管理员的标准。SQL Server数据库管理员需要负责管理SQL Server数据库，维护用户安全，配置SQL Server，备份数据库，管理磁盘空间，进行SQL Server打补丁和升级等等任务。所有这些技能都为执行SQL Server数据仓库任务提供了坚实的基础。

通常两者的技能差距就在于分析服务立方体。要熟练掌握SQL Server集成服务作业和报表服务的管理工作非常困难，这里的困难不是指开发难度而仅仅是指管理工作的难度。不过管理分析服务立方体则有点不一样，你必须花上一定时间来熟悉多维数据库的概念。如果所在企业同时还使用数据挖掘工具，那么还要花一定时间去学习如何维护数据挖掘模型。
除了分析服务立方体和数据挖掘以外，其他的技能差别就在于对数据仓库构建概念的知识了。SQL Server数据仓库管理员必须熟悉数据仓库构建和维度建模的概念(例如事实表、维度表、缓慢变化维度、代理键、聚集表、概要表、维度层次结构、迟到数据以及一致性维度等)以及缓慢变化维度Type 2中有效数据列的作用。数据仓库管理员还必须能够描述向事务事实表、周期快照事实表和累积快照事实表里加载数据的区别。
数据仓库构建概念是数据仓库管理员必须具备的基础知识。数据仓库管理员不需要拥有从头开始设计一个数据仓库的经验，不过必须具备一些关于数据仓库架构的基础知识，例如数据仓库数据库和联机事务处理数据库之间有什么区别。这些基础知识的积累非常必要，因为会影响到调优、表分区操作(扩展分区和旧区归档)、构建索引、查询和聚集操作等。理想状态下，数据仓库管理员还需要熟练掌握ETL过程的操作原理，例如关于数据是怎么从源系统抽取出来并装载到目标数据仓库里的;这是因为前面提到数据仓库管理员的重要职责之一就是监测ETL进程，而且这些知识的有无可能会影响到备份策略的实施。数据仓库管理员必须清楚在重新运行一个已经失败的ETL进程时，数据完整性会不会受到影响。
数据仓库管理员还需要深化对数据质量概念的认识。保证数据仓库中的数据准确性和完整性绝对是一项关键任务。如果我们连数据仓库中的数据都无法信任，那拥有数据仓库还有什么意义呢?数据仓库管理员需要明白保证数据质量的机制，例如在坏数据被加载进数据仓库之前，数据质量防火墙是怎样检测出这些坏数据的;而系统是如何报告这些坏数据并把它们纠正过来的。对数据质量控制过程有一个透彻的了解将有助于数据仓库管理员维护数据的质量。
以上就是数据仓库管理员必须执行的任务和必须担负的责任，以及执行这些任务所需要的技能和知识。如果有人希望走进数据仓库管理员大家庭，在此致以衷心的祝愿，希望一切顺利，不过要记住掌握扎实的技能和知识才是立足数据仓库管理员角色的王道。随着数据仓库在国内市场的开拓，相信这个角色也会越来越吃香的。目前越来越多的企业在考虑利用SQL Server平台来构建数据仓库，这意味着对于SQL Server数据库管理员而言是个难得的好机会。鉴于SQL Server 2008在数据仓库功能方面做了很大的改善(例如，星型联接查询优化、利用数据压缩功能改善事实表查询性能、利用变化数据捕获功能优化ETL过程，利用 Merge命令来进行更新插入操作等)，SQL Server数据仓库的吸引力将会越来越大。

SQL Server四类数据仓库建模方法

SQL Server四类数据仓库建模的方法主要分为以下四类。
第一类是关系数据库的三范式建模，通常我们将三范式建模方法用于建立各种操作型数据库系统。
第二类是Inmon提倡的三范式数据仓库建模，它和操作型数据库系统的三范式建模在侧重点上有些不同。
Inmon的数据仓库建模方法分为三层，第一层是实体关系层，也即企业的业务数据模型层，在这一层上和企业的操作型数据库系统建模方法是相同的;第二层是数据项集层，在这一层的建模方法根据数据的产生频率及访问频率等因素与企业的操作型数据库系统的建模方法产生了不同;第三层物理层是第二层的具体实现。
第三类是Kimball提倡的数据仓库的维度建模，我们一般也称之为星型结构建模，有时也加入一些雪花模型在里面。维度建模是一种面向用户需求的、容易理解的、访问效率高的建模方法，也是笔者比较喜欢的一种建模方式。
第四类是更为灵活的一种建模方式，通常用于后台的数据准备区，建模的方式不拘一格，以能满足需要为目的，建好的表不对用户提供接口，多为临时表。

BI数据级权限解决方案(zt)

BI数据分析是目前企业的热门应用，而对企业来说，权限控制是非常重要的，尤其是作为决策用的企业报表。目前基于微软SQL Server体系的BI架构为Integration Services + Analysis Service + Reporting Services，Integration Services和Analysis都属于应用后台的服务，不会在用户前端展现，其权限控制体系不在我们这篇文章的讨论范围内（但是实现数据级权限控制，需要Analysis Services的参与）。而对于前端展示用的企业报表，权限控制体系分为2种：报表级权限和数据级权限。报表级权限较为简单，主要用于控制谁能够看这个报表；数据级权限则比较复杂了，任何人看同一张报表，报表上的数据只能是他有权限查看的数据。简单说，就是总经理看到的数据和经理看到的数据是不一样的，虽然他们在看同一张报表。比较报表级权限和数据级权限，会发现如果实现了数据级权限的控制，那么企业报表是否需要进行权限控制已经不再重要（当然，为了界面友好性，还是应该控制下的）。
这篇文章主要就是讲述基于SQL Server架构的BI数据级权限的解决方案，这也是我给一个德国大型跨国企业客户实施其BI项目中，对方非常重视的一个功能。这里先简单介绍下这个客户和项目，出于保密要求，我把该客户叫做Customer S（简称CS，呵呵，不是那个游戏哦）。
CS项目前端采用Sharepoint，后台采用SQL Server，主要分析客户S的销售数据。CS的组织结构分为部门、区域；部门和区域是相互交叉的；某个部门的总部人员能够看到全国所有区域的数据；而区域员工则只能看到该区域的数据了。用户能够查看的数据权限，需要在网页上可以进行配置。这就是客户对数据级权限的要求。
针对这些需求，数据级权限解决方案采用如下架构：

BI数据级权限解决方案

报表查看流程说明：
用户查看报表
报表从Cube中获取数据
Cube从数据库中（记录用户的数据权限配置）获得访问用户的权限配置，根据配置返回相应的数据
报表显示结果数据
数据权限配置流程说明：
用户访问数据权限配置页面（由于基于Sharepoint，因此是内嵌数据权限Webpart的Sharepoint页面）
页面获取Cube结构（由于Cube的结构内容很庞大，为了避免网页响应慢，一般通过ajax树状来展示其结构）
用户修改数据权限设置，并且保存到数据库中
说明：这里面进行数据权限控制的对象为域帐号（可以为域用户或者组）。
纵览数据权限实现的这个流程，我们提取出中间几个重要的实现具体讲解解决方案，他们是：
Cube中如何进行权限控制
设置数据权限时，如何读取Cube结构
Cube中如何进行权限控制
SQL Server Analysis Services本身提供了一种设置Cube数据数据权限的机制。打开Analysis Services，我们可以看到“程序集”和“角色”2个条目，他们就是和数据权限设置紧密相关的内容了。如下图所示：
BI数据级权限解决方案

程序集：这是一个DLL类库，通过Visual Studio中新建一个Class Library（类库）来实现。主要作用是返回用户能够访问的Cube数据。
角色：这是访问用户的角色。在这里面可以设置角色的用户，更重要的是设置Cube调用哪个程序集来获取用户能够访问的数据。

我们先来看DataSecurity.dll程序集。这个程序集的代码其实很简单，不会超过30行。其主要流程如下：
1.　　　读取访问用户的数据权限设置 2.　　　根据数据权限设置，返回一个能够访问的Dimension数据集MDX字符串
我们看看如下的主要代码（这个类库也就只需要这样一个CS文件）：

namespace BI

{

public class DataSecurity

{

public static string GetDimensionSet(string domain_account, string dimension)

// 方法的名字无所谓，参数比较重要

// domain_account：访问用户的帐号，后面我们会知道是从角色的设置中传入

// dimension：是获取哪个维度的数据。在角色里，需要对每一个维度进行设置

{

//return "{[Location].[City].&[Seattle]}", 返回的结果示例

SqlConnection connection = new SqlConnection(connection_string);



connection.Open();



SqlCommand command = new SqlCommand("SP_Security_GetDimensionSetByLoginAccount", connection);



command.CommandType = System.Data.CommandType.StoredProcedure;



SqlParameter p1 = new SqlParameter("@domain_account", domain_account);

SqlParameter p2 = new SqlParameter("@dimension", dimension);



command.Parameters.Add(p1);

command.Parameters.Add(p2);



SqlDataReader reader = command.ExecuteReader();



string result_set = string.Empty;



int count = 0;



while (reader.Read())

{

count++;



if (result_set != string.Empty) result_set += ",";



result_set += (string)reader["DimensionSet"];

}



command.Dispose();



connection.Close();



connection.Dispose();



return "{" + result_set + "}";

}



public DataSecurity()

{

}

}

}

这个类库的作用很简单，抛开BI不谈，其实他就调用了一个存储过程，把返回结果做了一个字符串拼接，然后返回这个字符串。一般的返回结果会是大致如下：{[Location].[City].& [Seattle]}，这表示用户在Location维度下只能够看到Seattle的数据，其他的城市数据都看不到。当然如果是多个城市，那就是用逗号分隔的列表，比如：{[Location].[City].&[Seattle]，[Location].[City].& [Washington]}。
如上所示，字符串拼接很简单，但是这些用户能够访问的具体数据记录在哪呢？这就是用户在网页上设置好数据权限，记录在数据库中的字符串了。
在这里你要更清楚地话，就需要进一步了解MDX，这不在这篇文章的讨论范围之内。
我们首先完成了第一步，结下来就是在角色设置里调用这个DataSecurity.dll类库了。这个比较简单，但是繁琐，对于Cube中的每一个维度都需要手动设置。这个步骤根据如下的示意图走就是了，没有什么代码工作。
BI数据级权限解决方案

打开SecruityRole角色的属性，进入“维度数据”中就可以设置数据权限了。每一个需要控制数据权限的维度和属性都需要设置下，基本上设置为一条语句：
StrtoSet(BI.DataSecurity.GetDimensionSet(USERNAME, "City"))
对这个语句解释下：StrtoSet是将字符串转换为MDX里的数据集。USERNAME是访问者的域帐号，City则为我们自定义的参数，表示要获取City属性维度的授权数据。
到了这里，我们已经完成了很重要的一步，数据权限的主体已经实现了。但是对于用户来说，他需要有一个前端界面来设置这些数据权限。下面的内容就是为了解决这个问题，不过这里，我只挑出最重要的部分，读取Cube结构来讲，其他的部分你完全可以自己设计。在CS这个项目中，我们是做了如下工作：

所有的设置界面都是Sharepoint Webpart（请参见相关内容）
Webpart中的ajax（这个要单独拿出来说，是因为这个部分比较麻烦），需要用到ajax的原因就是Cube的结构是很大的，如果一次性读出来，肯定是等到花儿也谢了，相信没有人会用他。
后台可以控制哪些维度需要设置数据权限（有些维度不需要设置数据权限，那么就不让他在ajax树中展示出来了）
有自定义的角色，这个角色不同于刚才讲的Cube中的角色。这是用户自己定义的数据权限角色，刚才的角色只需要那一个就足够了。
有了角色自然有搜索、设置用户、设置角色的数据权限、编辑、删除
这个都可以根据你的需要进行设计，不一定要完全相同，就比如我用了ajax树来展示Cube结构，但是你可以采用别的方式。
下面我们就进入下一个重要话题，就是读取Cube的结构（其实在读取这个结构本身，前面的数据权限就已经开始起作用了，没有权限访问的数据不会出现在ajax树中）。
设置数据权限时，如何读取Cube结构
读取Cube的结构，微软已经提供了一套非常丰富的类库给我们。这里就简单介绍下，如果你在具体实现过程中遇到了问题，相信上Google是一个最好的办法。
下面我们介绍读取Cube结构的具体内容，首先就是连上你的Cube。这主要通过如下语句完成。

AdomdConnection adomdConnection = new AdomdConnection();

adomdConnection.ConnectionString = “Data Source=localhost;Catalog=MyCube;”;



adomdConnection.Open();



// 这里读取具体的Cube结构



adomdConnection.Close();

adomdConnection.Dispose();

在连上Cube之后，通过adomdConnection可以读取整个Cube了。主要涉及到如下内容：
Cube：CubeDef cube = adomdConnection.Cubes[“MyCubeName”];
Dimension：cube.Dimensions，这里面是所有的Dimension。
Hierarchy：dimension.Hierarchies，所有的层次
Member：hierarchy.Levels[0].GetMembers()，所有成员
通过以上几个内容就可以把整个Cube的结构完全展现出来，有了他们，就看你怎么展示你的Cube数据了。当然了，首先我们不能忘了添加一个引用：
Microsoft.AnalysisServices.AdomdClient
结束语
到这里为止，我介绍了BI数据权限解决方案中涉及到的最重要的内容，基于这些内容，你可以实现自己的BI数据权限解决方案，并且把它应用到你的项目中去，从而给你的项目添上亮点。
当然了，实现整个解决方案还是需要花费很多时间的，毕竟对用户来说，他们需要一个简单易用的结果。这些友好的界面工作都需要留给你来做。

source: http://www.itvue.com/Article/DB/MSSQL/200903/5534.html

Thursday, November 4, 2010

OLAP PivotTable Extensions (for caculated member issue)

source: http://olappivottableextend.codeplex.com/

OLAP PivotTable Extensions is an Excel 2007 and Excel 2010 add-in which extends the functionality of PivotTables on Analysis Services cubes. The Excel API has certain PivotTable functionality which is not exposed in the UI. OLAP PivotTable Extensions provides an interface for some of this functionality. It also adds some new features like searching cubes, configuring default settings, and filtering to a list in your clipboard. The add-in can be launched from the following menu option in the right-click menu for PivotTables:

Private Calculated Members

Any calculated members which are part of the Analysis Service cube on the server can be added to PivotTables. But there is no built-in UI to define your own MDX calculations. Adding extra Excel calculations in the cells surrounding the PivotTable has some limitations as they are not part of the PivotTable and can be wiped out if the dimensions of the PivotTable change, and plain Excel calculations must only operate on data visible in the PivotTable.

OLAP PivotTable Extensions let you define your own calculated measures which are private to that particular PivotTable:

Those calculations appear in the PivotTable just like any other calculations:

They appear in the Field List pane at the very top under the Values grouping:

For help with advanced calculation properties for these private PivotTable calculations, or for help with MDX expressions, refer to our Calculations Help page.

Best Practice: In order to ensure a single version of the truth, it is a best practice to define important calculations as part of the cube source code. But some calculations like simple ratios or differences may clutter the cube and may be more appropriate if defined in the PivotTable itself. In addition, certain ad hoc research or prototyping may be more appropriate to be done as calculations private to a PivotTable until they are finalized and are ready to be added to the cube source code.

Limitation: If you run "OLAP Tools... Convert to Formulas" on a PivotTable with private calculated members in Excel 2007, the private calculated members will show N/A. There is no known workaround at this point other than upgrading to Excel 2010 or having your OLAP administrator define these calculated members in the cube itself.

Calculations Library

Since all calculations you define are private to that one PivotTable, OLAP PivotTable Extensions automatically creates a Calculations Library for you which contains all the calculations you create. This allows you to pick any previous calculation you've used from a dropdown and add it to the current PivotTable:

You can also perform Calculation Library maintenance by importing, exporting, and deleting calculations:

View PivotTable MDX

If a PivotTable is performing poorly or returning incorrect numbers, it may be necessary for the Analysis Services administrator to troubleshoot the MDX query which the PivotTable is using. The MDX tab of the OLAP PivotTable Extensions dialog shows you this MDX.

The MDX is exactly what is sent to the server with one exception. Any private calculations you've created for your PivotTable are defined as session calculated members. The MDX query exposed on the MDX tab displays the formulas for these calculations as query calculated members in the WITH clause of the MDX query. This allows an administrator to copy and paste the MDX query and troubleshoot it more easily in Management Studio or MDX Studio.

Filtering PivotTable to a List

A common scenario is having a list of items you wish to research in a PivotTable. Instead of manually checking each item in the filter dropdown, you can use the Filter List feature from OLAP PivotTable Extensions:

Changing PivotTable Defaults

Certain settings must be manually changed after creating a new PivotTable. For instance, if your dimensions have calculated members you wish to see in your PivotTable, you must manually right click on the PivotTable, choose PivotTable Options, flip to the Display tab, then check "Show calculated members from OLAP server". The Defaults tab of OLAP PivotTable Extensions lets you default this setting to be on in any new PivotTables you create in the future. If checked, it also automatically sets "Refresh data when opening the file" on the connection properties:

Searching

Finding what you’re looking for in a cube can sometimes be challenging, but the Search feature of OLAP PivotTable Extensions can help. It lets you text search the items in the Field List and their descriptions. It also lets you text search the dimension members in your cube.

For more detailed information about the Search feature, see the dedicated Search page.

Distributing PivotTables

OLAP PivotTable Extensions need only be installed on computers which need to create new private PivotTable calculations. After those calculations have been defined, the PivotTable can be distributed to others without problem. If you distribute the Excel 2007 workbook to other Excel 2007 users, they will be able to continue designing and manipulating that PivotTable without problem. If published to Excel Services, the private calculations you define will still be active in the PivotTable.

The Calculation Library does not need to be distributed unless other users wish to start brand new PivotTables and reuse certain calculations you have created using OLAP PivotTable Extensions.

Using Excel 2007/2010 and OLAP PivotTable Extensions to edit an Excel 2003 format .xls workbook with a PivotTable and add a private calculated member will work. This PivotTable can be saved and distributed to users of Excel 2003 and the private calculated member will show up and work.

Installation Requirements

Excel 2007 or Excel 2010 is required.
Access to a cube on an Analysis Services server is required.
.NET Framework version 2.0 is required.

Troubleshooting Installation

If OLAP PivotTable Extensions is not visible in Excel, please consult the Troubleshooting Installation page.

Monday, September 13, 2010

MS BI Training Video

http://www.minesage.com/msu/minesage_43_372.html

Tuesday, August 31, 2010

Optimizing the Slowly Changing Dimension Wizard

http://blogs.msdn.com/b/mattm/archive/2010/08/05/optimizing-the-slowly-changing-dimension-wizard.aspx

As a follow-up to my previous post about SCD processing in SSIS, I thought I’d go deeper into using the built-in Slowly Changing Dimension Wizard. While there are definitely more efficient ways to do SCD processing in SSIS, there are some optimizations you can apply to the components that the wizard outputs that might make it more acceptable for your environment.
First, let’s take a look at the Wizard’s strengths. Besides the advantage of not having to write any SQL by hand, there are two key scenarios the SCD Wizard is optimized for:

Small number of change rows - Most ETL best practices say that you should perform change data capture (CDC) at the source (or as close to the source as possible). The Wizard was not designed to process your entire source table (like the Kimball Method SCD component) – for example, it doesn’t detect “deletes”.
Large dimensions – If you are dealing with large dimensions, SCD approaches that read the entire reference table may not be the best approach. For example, if I have 5-10K change rows coming in, and a 2 Million row dimension, the majority of the data flow time will be spent doing a full scan of your source dimension table.

If your scenario doesn’t match the above, you might want to consider just using one of the alternate approaches directly. If it does, or if you don’t want to use any custom components (or hand craft the SQL required for a MERGE statement, for example), consider making the following optimizations:
SCD Wizard output

Slowly Changing Dimension transform (Red)

The first transform does not cache any lookup results from the reference dimension, so every incoming row results in a query against the database. By default, the wizard will open a new connection to the database on each query. For a performance gain (and less resource usage), you can set the RetainSameConnection property of the wizard’s connection manager to True to re-use the same connection on each query.

OLE DB Command transforms (Green)

The wizard will output three separate OLE DB Command transforms (which perform row-by-row updates). You will get a big performance boost by placing the rows in a staging table, and performing the updates in a single batch once the data flow completes. Another option is to use the Batch Destination component, which is available on Codeplex.

OLE DB Destination (Blue)

The default destination that the Wizard outputs will have Fast Load disabled (to avoid locking issues). In many cases (mostly depending on the number of rows you’re processing), you can enable Fast Load for an immediate performance gain. To avoid potential deadlocking issues, another option is to use a staging table, and move the data over to the final destination table once the data flow is complete using a INSERT INTO … SELECT statement.
Using the above optimizations, I was able to bring down the processing time of a 200k row change set (against a 100k row dimension table) from 60 minutes to 14 minutes. Note however, that processing a similar change set using the SQL MERGE statement took under 3 minutes to complete.

I go into more detail about these optimizations (and the difference between SCD processing approaches) in my Performance Design Patterns talk.

Handling Slowly Changing Dimensions in SSIS

http://blogs.msdn.com/b/mattm/archive/2009/11/13/handling-slowly-changing-dimensions-in-ssis.aspx

I had a great time at PASS last week, and had a chance to talk to a lot of different SSIS users. One of the big topics seemed to be Slowly Changing Dimensions – I had a number of people ask for various improvements to the current Slowly Changing Dimension Transform in SSIS, and also ask for recommended alternatives in the meantime. I thought I’d summarize some of the more popular approaches I’ve seen, and see if anyone else has some alternatives.

Slowly Changing Dimension Wizard

You might have already tried the Slowly Changing Dimension Wizard that comes with SSIS 2005 and 2008 (and there are a number of good tutorials out there if you haven’t).
Outputs from the Slowly Changing Dimension Wizard

The SCD Wizard has a few things going for it - it’s quick and easy to implement, it handles most SCD scenarios out of the box, and its multi-component approach means you can customize it with the functionality you need.
It does have some pretty big limitations, however, which end up being a deal breaker for a lot of people.
A major inhibitor is the performance of the transform. It doesn’t perform that well for a couple of different reasons:

The data lookups are not cached – each row results in a SQL Query
OLE DB Command does row by row updates
OLE DB Destination added by the wizard doesn’t use FastLoad by default (which can be easily changed in the OLE DB Destination editor UI)

Another downside to the transform is the “one way” nature of the wizard – running it again (to change columns, for example) means you’ll lose any customizations you might have made to the other transforms.
I recommend using the wizard for simple dimensions, where you’re not processing a lot of data. If performance is a concern, consider one of the following approaches.

Using MERGE

I came across this tip from the Kimball Group when I was putting together my Merge & CDC talk last year.
Using the SQL MERGE Statement for Slowly Changing Dimension Processing
In this approach, you write all of your incoming data to a staging table, and then use Execute SQL Tasks to run MERGE statements (you actually have to do two passes – one for Type 1 changes, and one for Type 2 – see the details in the tip above). I posted the sample packages and code I used in a previous blog post.
The performance in this approach is very good (although it moves the bulk of the work to the database machine, which might not be what you want). I recommend it if you don’t mind staging the data, writing custom SQL, or can’t use a 3rd party component in your environment.

Kimball Method SSIS Slowly Changing Dimension Component

I’ve heard great things about Todd McDermid’s custom SCD Transform. Instead of doing row by row lookups, this transform takes in the dimension as an input. This makes the comparison much faster than the stock SSIS version. It wraps up all of the functionality into a single transform, which is great if you’re following the Kimball methodology.

Table Difference Component

I had the chance to meet with the SQLBI.EU guys at PASS, and they mentioned their Table Difference component. I haven’t tried it out myself, but I remembered an email from one of the SQL Rangers (Binh Cao) that suggested this component for SCD processing. I’ve included his write-up here:
Table difference is an SSIS custom component designed to simplify the management of slowly changing dimensions and – in general – to check the differences between two tables or data flow with SSIS.
The component receives input from two sorted sources and generates different outputs for unchanged, new, deleted or updated rows.

Unchanged rows (are the data rows that are the same in both inputs)
Deleted rows (are the data rows that appear in old source but not in new source)
New rows (are the data rows that appear in new source but not in old source)
Updated rows (are the data rows that appear in both flows but something is changed)

The inputs MUST be sorted and have a collection of fields (keys) that let the component decide when two rows from the inputs represent the same row, but this is easily accomplished by SQL Server with a simple “order by” and a convenient index; moreover the SCD do normally maintain an index by the business key, so the sorting requirement is easily accomplished and do not represent a problem.
Clicking on TableDifference control gives the following window.

TableDifference analyzes all the columns in both inputs and compares their names. If the name of two columns and their corresponding types are identical, TableDifference adds them to the available columns to manage.
If the flows are sorted, their sort columns will be marked as key fields, using the same order in which they appear in the sort.
All other columns are assigned a standard Update ID of 10 and are managed as comparable columns.
Using the component editor, you need to provide the following information for the columns:
Check Option: you can choose the column type between:

Key field: column will be used to detect when two rows from the inputs represent the same row. Beware that the inputs must be sorted by those columns
Compare: column will be compared one by one to detect differences
Prefer NEW: columns will be copied from the NEW input directly into the output, no check
Prefer OLD: columns will be copied from the OLD input directly into the output, no check

KeyOrder: If a column is of type “Key Field” it is the order under which the field appear under the “order by” clause of your query. Beware that the component do not check for the correct sorting sequence, it is up to you to provide this information.
Update ID: Each different UpdateID creates a different output flow. If you need to detect when a change appears in some column you can use different update ID. Beware that the lowest update ID wins, i.e. if AccountNumber has update id of 10 and AddressLine1 has update id of 20, then Accountnumber will be checked first and if a change is detected, the row will go to update output 10, no matter if AddressLine has a difference.

Outputs Panel gives option to choose which output to enable as well as to name and describe the output.

Output Details allows selection of the columns for each output. Here columns that are not needed for an output can be disabled. The picture shows an example of the DELETED output which only have the Customer Key column in its output. The less columns in the output, the better the performance of the component.
clip_image005

In the Misc Options tab, string comparisons definition can be defined:

The culture ID to use to perform the comparison. If not specified TableDifference will use the culture ID of the running task. The default is “empty”.
If you want it to ignore casing during string comparisons. The default is unchecked so TableDifference will perform comparison considering case.

In the Warnings panel, it will list any unused column from the input. As you might recall, if two columns are not identical regarding name and type, TableDifference will ignore them. This might be an error but the decision is up to you. By checking the warnings panel you can see if TableDifference is working with all the columns you need it to compare.

BI Study Video

SSIS Performance Design Patterns (video)

http://blogs.msdn.com/b/mattm/archive/2010/06/29/ssis-performance-design-patterns-video.aspx

Project REAL—Business Intelligence in Practice

https://www.microsoft.com/sqlserver/2005/en/us/project-real.aspx#documentation

Project REAL Documentation

The Project REAL team has provided content to help explain the parameters of the project. The following is a list of the documents available for Project REAL:

Project REAL Technical Overview
Read an introduction to the Project REAL system, its data models, subsystems, and deployment scenarios. This paper is the first in a series that explores various aspects of Project REAL and the best practices that it has revealed.
Project REAL: Analysis Services Technical Drilldown
Get a detailed technical discussion of designs and best practices for Analysis Services that were developed in Project REAL. This paper describes each of the different types of objects, such as data sources, data source views, dimensions, hierarchies, attributes, measure groups, and partitions in detail.
Business Intelligence ETL Design Practices
Download this white paper to understand ETL design decisions that were made for each scenario and implementation detail of the Project REAL effort for SQL Server 2005 Integration Services at Barnes & Noble.
Analysis Services 2005 Migration
Learn about the Migration Wizard, a fast and effective tool for moving your existing cubes to Analysis Services 2005.
Project REAL: Data Lifecycle—Partitioning
Download this white paper to get a detailed discussion on how partitioning was implemented in Project REAL, both on the relational data warehouse and in the Analysis Services cubes.
Inventory Predictive Modeling via Microsoft SQL Server 2005 Analysis Services
Discover an approach for building retail out-of-stock predictive models using SQL Server 2005 Analysis Services. When applied to Project REAL data, these models produced very accurate predictions.
Project REAL Monitoring and Instrumentation
This paper describes the system and network instrumentation and monitoring used for Project REAL, focusing on the tools used, their installation and configuration, and the lessons learned.
Using Visual Studio 2005 to Perform Load Testing on a SQL Server 2005 Reporting Services Report Server
This article contains step-by-step instructions for creating a Web page load test, and sample code and instructions for creating a unit test. Also, instructions are provided for setting up the load test that you use to specify load patterns.
Project REAL: Enterprise Class Hardware Tuning for Microsoft Analysis Services
This paper describes best practices for hardware optimization for a multi-terabyte SQL Server 2005 data warehouse and cube. It describes hardware testing and tuning exercises conducted with large-scale Analysis Services cubes.

Wednesday, August 18, 2010

BI除了烧钱到底在干啥—失败BI项目困惑

成功率只有30%，BI你敢不敢部署？
2010年最为热火当属云计算、物联网以及BI，对于CIO来讲，随着信息系统的逐渐完善，持续优化、以及信息系统的深化应用成为CIO关注的焦点。对于大部分用户基础的信息系统已经部署应用，从企业管理层的角度来讲，BI已经成为下一步 CIO具体要部署应用的重要方面，但CIO不是“傻瓜”，产品不成型以及各种条件因素的影响在加上超低的失败率，让CIO在做BI时犹如过去ERP项目一样，更加谨慎小心。笔者结合过去采访中遇到的BI应用真实案例以及第三方的社区ITPUB所结合的案例，真实的反映了BI的现状，并且由专业的技术经理、 BI专家提出了解决办法。更多参与讨论请点击：http://www.itpub.net/thread-1320555-1-1.html
　　情景还原：
天津一家制造业企业CIOBI在经历两年以后，最终以失败而告终！
据了解到该企业部署BI的情况：
1、该企业是一家制造业，早期应用了国外的ERP系统，在ERP系统上线后，准备开始做BI；
2、考虑到成本的问题，该企业是在上完ERP以后，准备自主开发BI系统，以行业、地区为纬度进行开发一个简单的BI工具；
3、当自主开发的BI上线以后，发现并不能满足企业的需求，该CIO充分选择行业内的BI商，但都觉得该产品不适合；
求助：该CIO一直比较困惑，如何才能把BI做好。两年BI项目的失败，让该CIO很是灰心，每当提起这事来，该CIO总是叹气，还没有做好。 BI作为辅助的工作或者手段，为什么在国内企业中的失败率比较高？
　　解读失败原因
BI的失败原因有许多，也许每一个细节或者环境没有设计好，那么就会出现失败的案例，但总体要有一个方向，只有大的方向不变的情况下，才会保证BI项目的正确性。
针对于上的问题，网友“markgigi ”认为，如果是自主开发BI系统，2-3年时间根本只能算开始，照他这样做的看来，不能满足需求的结果是可以预料的。而且这也是需求调研没有做好导致的。第二、上了ERP后开始上BI，首先得考虑企业内部的流程是否按照ERP系统进行了有效的流程再造和规范管理，相关数据采集是否达到了预期。匆忙找个BI 厂商，可能连ERP都没完全融合在企业中，没有理念，更枉论新的BI系统了；第三、现在很多做的都是单纯的functions requirements而非business requirements，所以经常会系统上了之后发觉好像七七八八功能多了不少，也能出来很多数据，但并没有达到预期的分析已有业务，为企业未来方向提供决策的目的。而这也是CIO的重要职能之一。
而网友“innovate511 ”则表示，显然这是没有分析规划。BI团队应起到古时“谋士”的作用。为什么？因为业务人员局限在自己的小范围业务里，高层人员只管战略方向，只看结果，那么谁来担负起分析战略的实现过程？业务人员不能，高层也不能，只能BI人做。我在文章中也举了一些实际的例子，反应出如果由业务部门来搞战略实现的具体需求，那是无厘头的需求，没人用的摆实。所以如果BI团队没有这样的高手，必定只能跟着用户的需求开发，直到被用户淘汰，毫无发展而言。
网友“markgigi ”补充强调，BI系统作为企业信息分析辅助决策的工具当然是面向能做决策的人的，前期调研都是根据不同行业不同企业具体分析和不同方法，不过可以把握的原则就是：
1. 核心用户是谁；
2. 他们最关心的问题是什么，当前或者中期他们的目标是什么；
3. 这些问题和目标的相关数据源是什么，有什么关联；
当然，一般制造业来说财务、供应链（库存）是关键其实什么报表，什么实时分析，仪表盘都是浮云，只要出来的东西能让老板拍脑袋拍的轻松点，就成功了。
　　烧钱的BI
关于BI项目的价值认定，时下很多的CIO都表示出很多的无奈与困惑，网友“lei”表示，对于BI同样困惑，至今对BI还不知道除了烧钱到底是在干啥，很多做BI很久的，感觉说话不是惯性的忽悠，就是在谈缥缈的理想。
网友“bq_wang”认为，做BI，首先把期望放低一些，先从数据中心开始首先汇聚数据、其次建数据仓库，开始结合ERP业务思考一些数据将来会如何应用。再次如果CIO和技术部门对业务不熟可以先找个咨询公司诊诊脉，毕竟BI不是一个纯技术工作等完事具备了，再开始上BI项目。
网友“innovate511”再次指出，数据仓库是基于BI主题需求而建的，而BI需求会随着时代的变化而变化，DW也需要不断改善模型和重建，所以没有一个尽头。唯一可做的就是不断让BI产生价值。
BI产生价值，第一阶段是出单纯的报表，如销售量、金额、利润、库存等等，然后是数据质量监控、日常数据查询。第二阶段是根据业务需求进行业务监控、问题预警，第三阶段是面向业务优化，不但有业务预警，还有相应的业务信息作为辅助，帮助用户进行最佳时机的最佳业务操作，第四阶段应该是未来预测、业务即使智能指导，这个阶段尚未有大的应用。BI能产生价值，方能长久。
　　关于BI:
BI 商业智能也称作BI是英文单词Business Intelligence的缩写。商业智能通常被理解为将企业中现有的数据转化为知识，帮助企业做出明智的业务经营决策的工具。这里所谈的数据包括来自企业业务系统的订单、库存、交易账目、客户和供应商等来自企业所处行业和竞争对手的数据以及来自企业所处的其他外部环境中的各种数据。而商业智能能够辅助的业务经营决策，既可以是操作层的，也可以是战术层和战略层的决策。为了将数据转化为知识，需要利用数据仓库、联机分析处理（OLAP）工具和数据挖掘等技术。因此，从技术层面上讲，商业智能不是什么新技术，它只是数据仓库、OLAP和数据挖掘等技术的综合运用。
商业智能的概念最早在 1996年提出。当时将商业智能定义为一类由数据仓库（或数据集市）、查询报表、数据分析、数据挖掘、数据备份和恢复等部分组成的、以帮助企业决策为目的技术及其应用。目前，商业智能通常被理解为将企业中现有的数据转化为知识，帮助企业做出明智的业务经营决策的工具。这里所谈的数据包括来自企业业务系统的订单、库存、交易账目、客户和供应商资料及来自企业所处行业和竞争对手的数据，以及来自企业所处的其他外部环境中的各种数据。而商业智能能够辅助的业务经营决策既可以是操作层的，也可以是战术层和战略层的决策。为了将数据转化为知识，需要利用数据仓库、联机分析处理（OLAP）工具和数据挖掘等技术。因此，从技术层面上讲，商业智能不是什么新技术，它只是数据仓库、OLAP和数据挖掘等技术的综合运用。

BI成了“鸡肋”如何破解BI的尴尬处境？

目前许多国内企业总是希望自己能在企业数据中，象沃尔玛一样能发现像“啤酒和尿布”这样具有关联性的商业规律，为企业创造最大价值。然而尽管像苏宁电器这样的一线大型零售商业已向BI迈进，但这一切仍只是一个起步，对于大多数本土中小企业来说，甚至都还难言起步，更谈不上基于BI的高级应用了。时至今日，我国许多企业“大而不强”、“规模不经济”的现象并没得到较好的改变。
　　BI何以成了处境尴尬的“鸡肋”？
当前在国内企业在BI应用中有一个十分突出、比较普通的问题，就是没有明确的价值实现方案。智能分析活动大多“淹没在数据当中”，信息挖掘仅停留在数据转换、表册生成上，为数据而统计数据，能提供精确决策的信息功能十分有限。有些企业在BI方面进行了大量的软硬件投资及人力投资，却并不能给企业带来预期的管理效率，BI有时成为“用之无味、弃之可惜”的鸡肋。中小企业面对商业智能，更是时有“水中月、镜中花”的感觉。
　　不少企业的BI何以成了处境尴尬的“鸡肋”？
通用性差，数出多“门”。由于一些历史原因，企业内部各个部门的数据来源不一，各自矛盾，而对于集团企业，特别是对于过去是分散式管理的集团公司，旗下各个企业的财务统计口径不一，子公司系统五花八门，使得数据难于共享、互通。还有，即使是在同一系统的同一个字段中，格式和命名规范也常定义不清，将容易导致BI项目的失败。另外，由于“信息孤岛”的存在，造成数据不集中和连续，给联机分析造成障碍，难以做到全面分析。
人员素质差，基础薄弱。商业智能的软件功能强大，内容非常复杂，没有专门的培训学习，一般是难于掌握。然而目前许多企业的技术人员比较缺乏，水平不高，对太复杂的软件、技术往往有一定的抵触情绪。再者，不少企业长期以来领导就不太重视BI运用，把BI当成一种权宜之计，没有建立经营分析信息机制，基础薄弱，对做好经营分析、提高分析效率也产生不利的影响。
配置维护费用过高，不堪重负。建立良好的BI系统，不只要购买商业智能软件，企业用户还得购买数据仓库服务器、大型数据库系统，以及配置IT技术人员，中途还有不断加码的升级费，仅仅这些就得投入几十上百万，让规模小、资金弱的中小企业望而兴叹，一些中小企业即上BI，或因承受不了不断加码的资金压力，只好半途而废也是时常有之事。
数据多是“过去式”，应用价值不高。目前企业BI运用多停留在过去数据和信息的流水帐通报，画一些曲线图、柱形图、饼图等简单的分析上，没有运用一些更先进的统计分析方法进行深入分析、挖掘，造成经营分析报告质量不高，其价值意义大打折扣。而对目前企业所更急需的用户消费、市场变化趋势、供应商、营销商和代理商等关联的外部信息，企业的研究、收集和前瞻却做得很不够。不少厂商BI系统也缺少对未来行业走势的预见、把脉的功能，数据功能大多是“过去式”而不是“进行式”、“将来式”，因此使数据材料缺乏可靠有效的依据，对增强企业竞争能力帮助不大，使领导“爱而不理”。
系统互为割离，缺乏整体系统的掌控。如今随着企业陆续越来越多引进从各种IT系统，BI存在被分割、陷入边缘化之虞，BI被更多地当作企业信息化过程中的一个子模块，在时间和技术等方面存在差异。这种设计中的思维局限，使BI与企业信息化中的其它模块存在一个接入瓶颈，即不论企业 ERP 、CRM、OA 或BI系统是否采用同一家企业产品，其之间都难以自然接入、集成，从而导致企业内部模块化信息传输出现瓶颈，互为割离，数据难有效共享，难于有效提高企业的管理水平和利润。
　　克服BI难关，助力精确管理
经过近20年的高速发展，目前我国许多行业企业的发展已遭遇新瓶颈了。易观国际指出，中国企业如今面临了新的挑战，企业的信息化解决方案相应僵化，例如管理模式、风险控制手段、数据利用相对落后等，难于赶上形势变化需求，为企业提供更好更快的服务。
为今之计，国内企业BI应该如何克服瓶颈，突破难关？该如何从浩如大海的数据海洋中，及时分析得出合理有利的决策信息，提高企业快速反应能力？
完善企业BI信息的基础工作。BI要建立在企业信息化具备一定基础的条件上，如果企业数据库等基础工作没有扎实，BI投资再大，其结果只能是沙滩建房，摇摇欲坠。只有做好了信息的基础工作，才能使BI有基本的运行平台，也为BI导入后的正常运作奠定了基础。主要是要通过数据标准化项目，建立企业数据字典，统一字段定义和统计口径，同时对数据质量不好的系统和数据库进行一次性的数据清洗转换，以夯实、提高BI项目成功实施的基石。
强调协同，注重与其它业务系统实现无缝集成。大型企业集团并购时常面临诸多挑战，最重大的挑战之一是如何整合企业内部不同信息系统，使之充分融合协同，互为促进。因此BI应有强大协同功能，首先厂商要从企业需求出发，做好与前后端数据的结合，更重要是内部的协同，与其他业务CRM、ERP、OA、财务等系统更好地融合、协同，能把企业中已存在的CRM、OA、MIS、ERP、财务系统等存储的企业经营管理业务数据最大集成到工作流系统中，使得系统界面统一、帐户统一，业务间通过流程进行紧密集成，而不必切换到不同系统进行调用，查阅数据能方便自如、真实管用，为企业提供统一准确有效的管理信息。
健全BI项目管理体系。项目管理体系，是用来帮助企业顺利完成IT项目的一套科学、系统的方法和策略。一套真正好并且适合自身公司的项目管理体系，能对项目进行有效管理，并大大提高项目完成的效率。因此企业尤其是大型企业集团在BI建设过程中必须从系统工程和科学管理的角度出发，建立健全完善的IT项目管理体系和运作机制，才能确保BI项目的成功实施。主要内容包括：提高IT人员基本素质，制订明确、量化的BI应用目标，进行BI等现代管理知识的培训教育，并引入第三方管理咨询，进行BI项目需求分析，开展企业管理创新，实行业务流程重组，逐步推广、实行BI项目监理制和BI项目评价机制、验收机制等。
向三个方向转变，创造新的竞争力优势。不管是选型还是今后重点应用内容，企业必须关注、把握BI未来如何演变、应用主方向，才能抓住根本，达到为企业决策服务之目的。BI未来应用方向主要有三个方面：
①从数据驱动向业务驱动转变。数据驱动指由数据的深度挖掘来辅助业务，发展到要以业务为驱动，从业务出发，根据商业策略及其所需的分析来很好运用数据；②从关注技术转向关注应用。BI将不再是一堆技术的集合，而是以应用为导向，来组合这堆技术，更好为改善业务服务；③从关注工具转向关注工具产生的绩效。BI将再也不是报表工具、OLAP工具的简称，而是有多种工具来保障业绩的提升，主要有七种工具：描述统计工具、报表与界面工具、经营技术与方法、经济预测方法与模型、OLAP分析、知识发现工具、专家系统以及决策方法与模型。这些工具应较好应用、充分配置。
对症下药，着重把握BI 的选型关。选型是信息化成功的前提，能否“选对郎”关系今后BI能否顺利推广。对于数量占大多数的中小企业而言，一个优秀适宜的的BI产品必须满足以下条件：价格不贵，性价比高，短期见效快；使用和管理简易，不需要IT的特别投入；功能上够用就行，支持基本的商务和绩效管理；通用性、易用性好，界面不陌生；技术上能随着企业的成长而增加功能，十分便于维护、升级。
总之，国内企业应深刻认识到BI是综合性的企业应用系统，其实施不是简单的软件安装、调试，也不只是硬件的购买与调试，更不只是管理理念的灌输，BI的实施需要对症下药，用整体规划、分步实施的原则来指导行动，并高效挖掘、发挥BI应用价值，方能借力BI，让企业时刻领先市场。

Tuesday, July 20, 2010

爆笑：两分钟让你明白什么是ERP！

把专业的问题通俗化——
　　ERP(Enterprise Resource Planning)企业资源计划系统，是指建立在信息技术基础上，以系统化的管理思想，为企业决策层及员工提供决策运行手段的管理平台。
一天中午，丈夫在外给家里打电话：“亲爱的老婆，晚上我想带几个同事回家吃饭可以吗？” (订货意向)
妻子：“当然可以，来几个人，几点来，想吃什么菜？”
丈夫：“6个人，我们7点左右回来，准备些酒、烤鸭、番茄炒蛋、凉菜、蛋花汤……。你看可吗？” (商务沟通)
妻子：“没问题，我会准备好的。” (订单确认)
妻子记录下需要做的菜单 (MPS计划) ，具体要准备的东西：鸭、酒、番茄、鸡蛋、调料…… (BOM 物料清单) ，发现需要：1只鸭蛋，5瓶酒，4个鸡蛋…… (BOM展开) ，炒蛋需要6个鸡蛋，蛋花汤需要4个鸡蛋 (共用物料) 。
打开冰箱一看 (库房) ，只剩下2个鸡蛋 (缺料) 。
来到自由市场，妻子：“请问鸡蛋怎么卖？” (采购询价)
小贩：“1个1元，半打5元，1打9.5元。”
妻子：“我只需要8个，但这次买1打。” (经济批量采购)
妻子：“这有一个坏的，换一个。” (验收、退料、换料)
回到家中，准备洗采、切菜、炒菜…… (工艺线路) ，厨房中有燃气灶、微波炉、电饭煲…… (工作中心) 。
妻子发现拨鸭毛最费时间 (瓶颈工序，关键工艺路线) ，用微波炉自己做烤鸭可能来不及 (产能不足) ，于是阅览室在楼下的餐厅里买现成的 (产品委外) 。
下午4点，接到儿子的电话：“妈妈，晚上几个同学想来家里吃饭，你帮忙准备一下。” (紧急订单)
“好的，你们想吃什么，爸爸晚上也有客人，你愿意和他们一起吃吗？”
“菜你看着办吧，但一定要有番茄炒鸡蛋，我们不和大人一起吃，6：30左右回来。” (不能并单处理)
“好的，肯定让你们满意。” (订单确定)
“鸡蛋又不购了，打电话叫小店送来。” (紧急采购)
6：30，一切准备就绪，可烤鸭还没送来，急忙打电话询问：“我是李太，怎么订的烤鸭还不送来？” (采购委外单跟催)
“不好意思，送货的人已经走了，可能是堵车吧，马上就会到的。”
门铃响了。
“李太太，这是您要的烤鸭。请在单上签一个字。” (验收、入库、转应付账款)
6：45，女儿的电话：“妈妈，我想现在带几个朋友回家吃饭可以吗？” (呵呵，又是紧急订购意向，要求现货)
“不行呀，女儿，今天妈已经需要准备两桌饭了，时间实在是来不及，真的非常抱歉，下次早点说，一定给你们准备好。” (哈哈，这就是ERP的使用局限，要有稳定的外部环境，要有一个起码的提前期) 。
…… ……
送走了所有客人，疲惫的妻子坐在沙发上对丈夫说：“亲爱的，现在咱们家请客的频率非常高，应该要买些厨房用品了 (设备采购) ，最好能再雇个小保姆 (连人力资源系统也有缺口了) 。
丈夫：“家里你做主，需要什么你就去办吧。” (通过审核)
妻子：“还有，最近家里花销太大，用你的私房钱来补贴一下，好吗？” (最后就是应收货款的催要)
现在还有人不理解ERP吗？记住，每一个合格的家庭主妇都是生产厂长的有力竞争者。