If you are a regular user of the vlookup function in Microsoft Excel, the Table Lookup transform should feel very familiar. It’s used to perform essentially the same function. Unlike the Microsoft Excel version, the PlaidCloud Analyze Table Lookup transform offers greater flexibility, especially allowing for matching on and returning multiple columns.
Table Data Selection
Specify the source data table by selecting it from the dropdown menu.
Specify any columns to be included here. Selecting the Inspect Source and Populate Source Mapping Table buttons will make these columns available for the join operation.
Select Subset of Source Data
Any valid Python expression is acceptable to subset the data. Please see Expressions for more details and examples.
To establish the target table select either an existing table as the target table using the Target Table dropdown or click on the green "+" sign to create a new table as the target.
When creating a new table you will have the option to either create it as a View or as a Table.
Views are useful in that the time required for a step to execute is significantly less than when a table is used. The downside of views is they are not a useful for data exploration in the table Details mode.
When using a table as the target a step will take longer to execute but data exploration in the Details mode is much quicker than with a view.
Specify join conditions. Using the Guess button will find all matching columns from both Table 1 as well as Table 2. To add additional columns manually, right click anywhere in the section and select either Insert Row or Append Row, to add a row prior to the currently selected row or to add a row at the end, respectively. Then, type the column names to match from Table 1 to Table 2. To remove a field from the Join Map, simply right-click and select Delete.
Target Output Columns
Data Mapper Configuration
The Data Mapper is used to map columns from the source data to the target data table.
Inspection and Populating the Mapper
Using the Inspect Source menu button provides additional ways to map columns from source to target:
- Populate Both Mapping Tables: Propagates all values from the source data table into the target data table. This is done by default.
- Populate Source Mapping Table Only: Maps all values in the source data table only. This is helpful when modifying an existing workflow when source column structure has changed.
- Populate Target Mapping Table Only: Propagates all values into the target data table only.
If the source and target column options aren’t enough, other columns can be added into the target data table in several different ways:
- Propagate All will insert all source columns into the target data table, whether they already existed or not.
- Propagate Selected will insert selected source column(s) only.
- Right click on target side and select Insert Row to insert a row immediately above the currently selected row.
- Right click on target side and select Append Row to insert a row at the bottom (far right) of the target data table.
To delete columns from the target data table, select the desired column(s), then right click and select Delete.
Chaging Column Order
To rearrange columns in the target data table, select the desired column(s). You can use either:
- Bulk Move Arrows: Select the desired move option from the arrows in the upper right
- Context Menu: Right clikc and select Move to Top, Move Up, Move Down, or Move to Bottom.
Reduce Result to Distinct Records Only
To return only distinct options, select the Distinct menu option. This will toggle a set of checkboxes for each column in the source. Simply check any box next to the corresponding column to return only distinct results.
Depending on the situation, you may want to consider use of Summarization instead.
The distinct process retains the first unique record found and discards the rest. You may want to apply a sort on the data if it is important for consistency between runs.
Aggregation and Grouping
To aggregate results, select the Summarize menu option. This will toggle a set of select boxes for each column in the target data table. Choose an appropriate summarization method for each column.
- Group By
- Count (including nulls)
- Standard Deviation
- Sample Standard Deviation
- Population Standard Deviation
- Sample Variance
- Population Variance
- Advanced Non-Group_By
For advanced data mapper usage such as expressions, cleaning, and constants, please see the Advanced Data Mapper Usage
To allow for maximum flexibility, data filters are available on the source data and the target data. For larger data sets, it can be especially beneficial to filter out rows on the source so the remaining operations are performed on a smaller data set.
Select Subset Of Data
This filter type provides a way to filter the inbound source data based on the specified conditions.
Apply Secondary Filter To Result Data
This filter type provides a way to apply a filter to the post-transformed result data based on the specified conditions. The ability to apply a filter on the post-transformed result allows for exclusions based on results of complex calcuations, summarizaitons, or window functions.
Final Data Table Slicing (Limit)
The row slicing capability provides the ability to limit the rows in the result set based on a range and starting point.
The filter syntax utilizes Python SQLAlchemy which is the same syntax as other expressions.
View examples and expression functions in the Expressions area.
Lookup Product Dimension Information
In this example, the modeler needs information from the product dimension table to make sense of the order fact table. As such, the Import Order Fact table is selected as the Source Table. The Import Product Dim table contains the desired lookup information, so it’s selected as the Lookup Table Source. Although available, no filters are applied to the lookup data table (nor any other data tables, for that matter).
In the Table Data Selection section, all columns are mapped from the source data table to the target data table.
No Data Filters are applied to either source or target data.
Lastly, the source data table is matched to the lookup data table using the Product_ID field found in each table. Only the Product_Description and Unit_Cost columns are appended to the target data table, with Unit_Cost being renamed to Retail_Unit_Cost in the process.
In the resulting target data table, the Product_Description and Retail_Unit_Cost columns have been added, based on matching values in the Product_ID column.