官术网_书友最值得收藏!

Importing data as text

Tabular modeling natively supports the import of files with extensions of text (.txt), comma separated values (.csv), and tab separated values (.tab). Once a file to import has been defined, the user can specify the delimiter (column separator), and the importing interface scans the file to estimate the underlying data types for each column. Finally, the interface finishes the import process by loading the data according to this specification.

Getting ready

A text file to simulate the output file for the sales record header has been created and is available from the online resources for this chapter. This file includes the sales_order_id, customer_id, employee_id, currency_id, customer_po_id, sales_territory_id, order_dt (order date), due_dt (due date), ship_dt (ship date), and sales_amount fields.

The file will be imported as a table into PowerPivot (and the tabular model). A screenshot of the file is shown as follows:

Getting ready

How to do It…

Most data imports can be managed from the PowerPivot window:

  1. Launch the PowerPivot window.
  2. In the Home tab, click on the From Text button from the Get External Data grouping.
  3. Call the Sales Header TXT connection.
  4. Ensure that the file type Comma Separated Files (.csv) is selected from the drop-down box.
  5. Navigate to the resources listed in Chapter 2, Importing Data and select the 02_master.csv file.
  6. Click on Open.
  7. A new window will open showing the file structure. Ensure that Comma (,) is selected from the Column Separator drop-down list, and that the Use first row as column headers checkbox is ticked.
  8. Confirm the import by clicking on Finish, as shown in the following screenshot:
    How to do It…
  9. The data will load (click on Close to exit the Table Import Wizard window).

    Tip

    The Table Import Wizard displays the data that will be imported in columns and rows. Generally, it may be said that this grid is shared across all types of imports. You can choose not to import a particular column by deselecting the column from the header field selection (note that in the preceding screenshot, all columns are selected). Additionally, the amount of data to be imported can be filtered by selecting the drop-down arrow next to a field and applying a filter through the graphical interface. It is considered best practice to import only those columns which are needed in the model, because the extra columns or rows will increase the size of the model.

How it works…

Once the file name has been provided, the Table Import Wizard window uses a text driver to scan the first 200 rows of the file and determine the data type of each column. The text driver then uses this definition to import all the data for the file. The columns order_dt, due_dt, and ship_dt have been defined as dates, even though the file did not explicitly specify a date type (after all it is text).

Once the wizard has determined the data types for columns, it will use these types for the entire load. This may create issues where the value within the file does not conform to the data type specifications. In such a situation, the specific value will be discarded from the load. The row will be imported even if all the values are discarded. This type of situation can occur when a numeric value is expected, and a text value is found in the file, or the expected date format for the import cannot be derived from the underlying date value. For example, when the column format is DD-MM-YY, and the value follows the format MM-DD-YY.

The wizard can be defined for different delimiters. Tab, Comma, Semicolon, Space, Colon, and Vertical Bar (Pipe) are supported. This is regardless of the file extension (so you could import a .csv file with a Tab delimiter). However, the importer only allows you to import files that have extensions of .csv, .tab, or .txt. Additionally, you can specify if the first row of the file contains headings (and these are also imported).

There's more...

Although you can specify standard delimiters, the file may contain an unsupported character as its delimiter, and the data format chosen by the wizard may simply be the wrong type for the file. To overcome these issues (or the issues they may create), the text driver defaults (which are defined by the Wizard) can be overridden, so that the input file can be fully defined as load time. This is done by using a defining file titled schema.ini. This file needs to be saved in the same location as the file being imported. This offers the modeler full control over the specification of the import file. Consider the previous text import file with the following changes;

  • The delimiter is now a tilde (~)
  • The date format is MM-DD-YYYY (note that the local for the machine used in this book specified DD-MM-YYYY)

Repeat the recipe using the 02_master_tilde.csv file ensuring that the schema.ini file exists in the same directory. When the Wizard loads, an information box at the bottom of the window will indicate the presence of a schema.ini file, and this file will be used to specify import settings, as shown in the following screenshot:

There's more...

The schema.ini file is a text file and can therefore be examined in notepad—a sample of our file is shown in the next screenshot. In order for it to be used, it must reside in the same folder as the file(s) being imported. The following are some universal notes about the structure of the file that should be mentioned:

  • The name of the file for which the schema is to be applied is specified as the first line. Note that the import is specified for a 02_master_tilde.csv file
  • The date format is specified (see the row DateTimeFormat) and is applied to each date field
  • The delimiter is specified as a tilde
  • Each column (by number) can specify a name different from the file's header row
    There's more...

Naturally, there can only be one schema.ini file in a directory. However, the same file can be used to specify the format of individual files. This is achieved by simply extending the next specification with the name of the file, as has been done for 02_master_tilde.csv.

Note

Further information about the format of the schema.ini file can be found at http://msdn.microsoft.com/en-us/library/windows/desktop/ms709353(v=vs.85).aspx.

主站蜘蛛池模板: 张家口市| 肇州县| 潢川县| 五河县| 襄城县| 横峰县| 高淳县| 东乌珠穆沁旗| 威远县| 应城市| 利辛县| 杨浦区| 南投市| 江北区| 田林县| 宜城市| 伊金霍洛旗| 廊坊市| 鹤峰县| 唐山市| 石柱| 焦作市| 天祝| 苏州市| 万宁市| 东乌| 广昌县| 宝兴县| 新宾| 黎城县| 永泰县| 凤凰县| 志丹县| 西青区| 郎溪县| 开江县| 商洛市| 华池县| 徐水县| 阳高县| 桐城市|