Building the Textual Data Warehouse
For years corporate decisions have been made on the basis of the data found in transaction based systems. Transaction oriented data fits well with standard database management systems because database management systems structure data in a repetitive manner, where each occurrence of data has the same structure as each other occurrence of data in a table. But there is another viable and important source of data in the corporation. That source of data is the information found in the form of text. There are many forms of text in the corporation – emails, spreadsheets, contracts, warranties, medical and healthcare information, and so forth. Because text is not repetitive it does not fit easily and well with standard database management systems. But now there is textual ETL and the ability to build databases and data warehouses that contain textual information. When textual data is able to be transformed so that the text fits inside a standard database management system, whole new opportunities for analysis and decision making are created.
This two day lecture/workshop is about what is required to create the textual, unstructured data warehouse. The first day is lecture and the second day is a hands on workshop.
- An Introduction To Unstructured Data
- Issues of Textual Integration
- Forms of Text
- Diverse Indexes
On day 2 Textual ETL will be run producing a wide variety of data bases/data warehouses using many of the features of Textual ETL. The attendees will observe and participate in the transformation of text into a data base ready for analytic processing.
The workshop begins by examining some textual data. A strategy for capturing and organizing the text is discussed. Then the workshop continues with several types of processing that are done dynamically, under the purview of the attendees. Some of the types of processing that are done include:
- document metadata capture
- document fracturing
- named value indexing
- simple indexing
- semistructured indexing
- merged indexing
Depending on the textual data that has been selected, some or all of these kinds of indexes will be chosen and created.