A Monthly Article from our Speakers
Current Article of the month
Managed Meta Data Environment (MME):
A Complete Walkthrough (Part I)
by David Marco
This article is adapted from the book "Universal Meta Data Models" by David Marco & Michael Jennings, John Wiley & Sons
Almost every corporation and government agency has already built, is in the process of building, or is looking to build a Managed Meta Data Environment (MME). Many organizations, however, are making fundamental mistakes. An enterprise may build many meta data repositories, or “islands of meta data” that are not linked together, and as a result do not provide as much value (see “Where’s my meta data architecture?” sidebar).
Let’s take a quick meta data management quiz. What is the most common form of meta data architecture? It is likely that most of you will answer, “centralized”; but the real answer is “bad architecture”. Most meta data repository architectures are built the same way data warehouse architectures were built: badly. The data warehouse architecture issue resulted in many Global 2000 companies rebuilding their data warehousing applications, sometimes from the ground up. Many of the meta data repositories being built or already in use need to be completely rebuilt.
The goal of this article is to ensure that your MME’s architecture is constructed on a rock solid foundation that provides your organization with a significant advantage over the poorly architected MMEs. I will present the complete MME architecture, walk through, in detail each of the six major components and the sustainment of the MME.
WHERE’S MY META DATA ARCHITECTURE?
At EWSolutions one of our clients is a large pharmaceutical company. Since knowledge is the lifeblood of any pharmaceutical company, these types of firms tend to have very large meta data requirements and staffs. This company had decided to have a “Meta Data Day” and as such, they had asked me to come on-site and give a keynote address to kick the day off. Between 60-80 people attended “Meta Data Day”.
After the keynote address were a series of workshops. We counted 4 separate meta data repositories in production and 3 other separate new meta data repository initiatives starting up -- a classic “islands of meta data” problem. This is not an approach that leads to long-term positive results. None of these islands are linked to each other and much of the most valuable meta data functionality will come from the relationships that the meta data has with itself. For example, it is highly valuable to view a physical column name (technical meta data) and then drill-across to the business definition (business meta data) of that physical column name.
The managed meta data environment represents the architectural components, people and processes that are required to properly and systematically gather, retain and disseminate meta data throughout the enterprise. The MME encapsulates the concepts of meta data repositories, catalogs, data dictionaries and any other term that people have thrown out to refer to the systematic management of meta data. Some people mistakenly describe an MME as a data warehouse for meta data. In actuality, a MME is an operational system and as such is architected in a vastly different manner than a data warehouse.
Companies that are looking to truly and efficiently manage meta data from an enterprise perspective need to have a fully functional MME. It is important to note that a company should not try to store all of their meta data in a MME, just as the company would not try to store all of their data in a data warehouse. Without the MME’s components, it is very difficult to be effective managing meta data in a large organization. The six components of the MME, shown in Figure 1, are:
- Meta data sourcing layer
- Meta data integration layer
- Meta data repository
- Meta data management layer
- Meta data marts
- Meta data delivery layer
Figure 1: Managed Meta Data Environment
MME can be used in either the centralized, decentralized or distributed architecture approaches: Centralized architecture offers a single, uniform, and consistent meta model that mandates the schema for defining and organizing the various meta data stored in a global meta data repository. This allows for a consolidated approach to administering and sharing meta data across the enterprise. Decentralized architecture creates a uniform and consistent meta model that mandates the schema for defining and organizing a global subset meta data to be stored in a global meta data repository and in the designated shared meta data elements that appear in local meta data repositories. All meta data that is shared and re-used among the various local repositories must first go through the global repository, but sharing and access to the local meta data are independent of the global repository. Distributed architecture includes several disjointed and autonomous meta data repositories that have their own meta models to dictate their internal meta data content and organization with each repository solely responsible for the sharing and administration of its meta data. The global meta data repository will not hold meta data that appears in the local repositories, instead it will have pointers to the meta data in the local repositories and meta data on how to access it1. At EWSolutions we have built MMEs that use each of these three architectural approaches and some implementations use combinations of these techniques in one MME.
Meta Data Sourcing Layer
The meta data sourcing layer is the first component of the MME architecture. The purpose of the Meta Data Sourcing Layer is to extract meta data from its source and to send it into the Meta Data Integration Layer or directly into the meta data repository (see Figure 2). Some meta data will be accessed by the MME through the use of pointers (distributed) that will present the meta data to the end user at the time that it is requested. The pointers are managed by the Meta Data Sourcing Layer and stored in the Meta Data Repository.
Figure 2: Meta Data Sourcing Layer
It is best to send the extracted meta data to the same hardware location as the Meta Data Repository. Often meta data architects incorrectly build meta data integration processes on the platform that the meta data is sourced from (other than record selection, which is acceptable). This merging of the meta data sourcing layer with the meta data integration layer is a common mistake that causes a whole host of issues.
As sources of meta data are changed and added (and they will), the meta data integration process is negatively impacted. When the meta data sourcing layer is separated from the Meta Data Integration Layer only the meta data sourcing layer if impacted by this type of change. By keeping all of the meta data together on the target platform the meta data architect can adapt the integration processes much more easily.
Keeping the extraction layer separate from the sourcing layer provides a tidy backup and restart point. Meta data loading errors typically happen in the meta data transformation layer. Without the extraction layer, if an error occurred the architect would have to go back to the source of the meta data and re-read it. This can cause a number of problems. If the source of meta data has been updated it may become out of sync with some of the other sources of meta data that it integrates with. Also the meta data source may currently be in use and this processing could impact the performance of the meta data source. The golden rule of meta data extraction is:
Never have multiple processes extracting the same meta data from the same meta data source.
In these situations, the timeliness and consequently the accuracy of the meta data can be compromised. For example, suppose that you have built one meta data extraction process (Process #1) that reads physical attribute names from a modeling tool’s tables to load a target entity in the meta model table that contains physical attribute names. You also built a second process (Process #2) to read and load attribute domain values. It is possible that the attribute table in the modeling tool has been changed between the running of Process #1 and Process #2. This situation would cause the meta data to be out-of-sync.
This situation can also cause unnecessary delays in the loading of the meta data with meta data sources that have limited availability/batch windows. For example, if you were reading database logs from your enterprise resource planning (ERP) system you would not want to run multiple extraction processes on these logs since they most likely have a limited amount of available batch window. While this situation doesn’t happen often, there is no reason to build in unnecessary flaws into your meta data architecture.
The number and variety of meta data sources will vary greatly based on the business requirements of your MME. Though there are sources of meta data that many companies commonly source, I’ve never seen two meta data repositories that have exactly the same meta data sources (have you every seen two data warehouses with exactly the same source information?), but following are the most common meta data sources:
- Software tools
- End users
- Documents and spreadsheets
- Messaging and transactions
- Web sites and E-commerce
- Third parties
A great deal of valuable meta data is stored in various software tools. Keep in mind that many of these tools have internal meta data repositories designed to enable the tool’s specific functionality and typically are not designed to be accessed by meta data users, or integrated into other sources of meta data. You will need to set up processes to go into these tool’s repositories and pull the meta data out.
Of these tools, relational databases and modeling tools are the most common sources of meta data for the meta data sourcing layer. The MME usually reads the database’s system tables to extract meta data about physical column names, logical attribute names, physical table names, logical entity names, relationships, indexing, change data capture, and data access.
- Part Three of this series will continue to walking through the sources of meta data that the Meta Data
End users are one of the most important sources for of meta data that is brought into the MME. These users come in two flavors: business and technical. Figure 3 lists the types of meta data entry done by each group.
Often the business meta data for a corporation is stored in the collective conscience of its employees’ “tribal knowledge”. As a result, it is vital for the business users to input business meta data into the repository. The need for active and engaged business users ties into the topic of data stewardship2.
Figure 3: Meta Data Sourcing Layer: End User Meta Data Entry
The technical users also need direct access into the Meta Data Repository to input their technical meta data. Because much of the technical meta data is stored in various software tools, the task for technical users to input the technical meta data is not as rigorous as it is for business users to input the business meta data.
The interface for both of these user groups should be Web-enabled. The Web provides an easy to use and intuitive interface that both of these groups are familiar with. It is critical that this interface is directly linked to the meta data in the repository. I strongly suggest the use of drop boxes and pick lists, as these are functions that users are highly familiar with. You should always use the referential integrity that the database software provides.
Documents and Spreadsheets
A great deal of meta data is stored in corporate documents (Microsoft Word) and spreadsheets (Microsoft Excel). The requirements of your MME will greatly impact the degree to which you need to bring in meta data from documents or to provide pointers to them. Sometimes these documents and spreadsheets are located in a central area of a network or on an employee’s computer. In most organizations, though, documents and spreadsheets tend to be highly volatile, and lack standardized formats and business rules. As a result, they are traditionally one of the most unreliable and problematic sources of meta data in the MME. Sometimes business meta data for these sources can be found in the note or comment fields associated to the document or to a cell (if a spreadsheet). Technical meta data, such as calculation, dependences, or lookup values are stored in the application’s (Microsoft Excel or Lotus 1-2-3) proprietary data store.
For companies that have implemented a document management system, it’s important to extract the meta data out of these sources and bring it into the MME’s repository. Typically when a company builds a document or content management system, it also purchases a software product to aid management of meta data on documents, images, audio, geospatial (geographical topography), and spreadsheets. It is important to have a meta data sourcing layer that can read the meta data in the document management tool and extract it out and bring it into the MME’s repository. This task is extremely difficult because most document management companies do not understand that they are really meta data repositories and, as such, need to be accessible. These tools often employ proprietary database software to persist their meta data and/or their internal database structure is highly obfuscated, meaning that the structure of the meta data is not represented in the meta model, but is instead, represented in program code. As a result, it can be difficult to build processes to pull meta data out of these sources (Figure 4).
Figure 4: Meta Data Sourcing Layer: Document Management Sources
Messaging and Transactions
Many companies and government agencies are using some form of messaging and transactions, either Enterprise Application Integration (EAI) or XML (sometimes EAI applications use XML), to transfer data from one system to another. The use of EAI and XML is a popular trend as enterprises struggle with the high cost of maintaining current point-to-point approaches to data integration. The problem with point-to-point integration is that the information technology (IT) environment becomes so complex that it is impossible to manage it effectively or efficiently, especially if you do not have an enterprise level MME. An EAI messaging paradigm should help companies unravel their current point-to-point integration approaches. Figure 5 shows an EAI messaging bus which provides the technical engine for the EAI messages.
Figure 5: EAI Messaging Bus
While the vast majority of companies are not very advanced in their use and application of EAI and XML, these types of processes can be used to capture highly valuable meta data: business rules, data quality statistics, data lineage, data rationalization processes, etc. Since the EAI tools are designed to manage the messaging bus, not the meta data around it, it is important to bring this meta data from the EAI tools into the MME to allow for global access, historical management, publishing and distribution. Without a good MME it becomes very difficult to maintain these types of applications. Large government organizations and major corporations are using their MMEs to address this challenge.
Within the wide array of applications a corporation uses, some will be custom-built by the enterprise’s IT department (e.g. data warehouses, general ledge systems, payroll, supply chain management), others will be based on packages (e.g. PeopleSoft, SAP, Siebel), and some may be outsourced or based on an Application Service Provider (ASP) model. This proliferation of applications can be quite voluminous. For example, we know of several corporations and government agencies whose applications number in the thousands.
Each of these applications contains valuable meta data that may need to be extracted and brought into the MME application. Assuming the applications are built on one of the popular relational database platforms (i.e. IBM, Oracle, Microsoft, Sybase, Teradata) the Meta Data Sourcing Layer can read the system tables or logs of these databases. There is also considerable meta data stored within these varied applications. Business rules and lookup values are buried within the application code or control tables. In these situations, a process needs to be built to bring in the meta data.
Web Sites and E-Commerce
One of the least used sources of meta data is corporate Web sites. Many companies forget the amount of valuable meta data that is contained (or should we say locked away) in hypertext markup language (HTML) on Web sites. For example, healthcare insurance industry analysts need to know the latest information about the testing of a new drug for patient treatments. Research is typically conducted by a doctor working with a hospital. The doctor usually posts his findings to the hospital’s Web site or portal, so it’s important to capture meta data around these Web sites, such as when the site was updated, what was updated, and so on.
This also applies to e-commerce. When a customer orders a product via the web valuable meta data is generated and needs to be captured into the MME.
For many companies it is a standard business process to interact heavily with third parties. Certainly companies in the banking, national defense-related agencies, healthcare, finance, and certain types of manufacturing need to interact with business partners, suppliers, vendors, customers and government or regulatory agencies (such as the Food & Drug Administration and Census bureau) on a daily basis. For every systematic interaction, these external data3 sources generate meta data that should be extracted and brought into the MME.
1 See Chapter 7 of “Building and Managing the Meta Data Repository” (David Marco, Wiley 2000) for a more detailed walkthrough of these approaches
2 For a detailed discussion on Data Stewardship please see David Marco’s four-part series which ran in DM Review December, 2002 – March 2003
3 See Chapter 2 of “Building and Managing the Meta Data Repository” (David Marco, Wiley 2000) for a more detailed discussion of external meta data sources