In this article, you will find information and multiple examples on the following topics associated with working with JSON and XML data sources.JSON/XML Data Structure
Selecting JSON/XML Data
XPath Expressions (with examples)
Advanced Usage (with examples)
About the structure of JSON/XML data
JSON and XML are hierarchical data formats made up of:
- objects, one or more field name and value pairs, indicated in Klipfolio by curly braces
- arrays, lists of objects of the same type, indicated in Klipfolio by square brackets
Where tabular data is referenced by column and row, JSON/XML data is referenced by its path, which indicates its position in the hierarchy. This path, also called the XPath, describes the levels traversed to reach the object. For example, in the following data source,
- /player/first_name is the XPath of the first_nameobject
- /player/team/wha is the XPath of the wha object
Selecting JSON/XML Data
In Klipfolio, when you select data from a JSON/XML data source, the selected field’s path, or XPath, is displayed in the XPath editor. In the following image, the XPath of the name object, /player/team/name, is displayed in the XPath editor.
By default, the values of all peer elements are selected--this is equivalent to selecting a column of data in a table. If you need to change this selection, click Selection Options as shown in the image above, and select the required option:
- Select only this <field name> element. Only one specific <field_name> element is selected. This is equivalent to selecting a specific cell in a table such as CSV or Excel. The XPath editor displays the specific path for example: /player/team/name.
- Select all peer <field name> elements. All <field_name> elements at this level of the hierarchy are selected. This is equivalent to selecting a column of data in a table such as CSV or Excel. The XPath editor displays the path, for example: /player[*]/team/name.
- Select all <field_name> elements. All <field_name> elements at any level of the hierarchy are selected.
- Select all <field_name> elements in this parent object element. All <field_name> elements in this branch of the hierarchy are selected.
A JSON or XML data source sometimes requires special handling to access its objects and arrays due to its tree structure with varying branches and sub-branches. This typically involves editing the XPath to specify how to traverse the data hierarchy.
For example, when the last_name field is selected, all last_name values in the array are returned (equivalent to selecting a column in a table such as CSV or Excel: Hedberg, Richard, Gretzky. To select a specific last_name value equivalent to selecting a specific cell in a table):
- Select the last_name field. /player/last_name is displayed in the XPath Editor.
- Edit the XPath:
- Click XPath to open the XPath editor.
- To select only the first last_name value, Hedberg, change /player/last_name to /player/last_name.
- To select only the second last_name value, Richard, change /player/last_name to /player/last_name.
- Click Simple to close the XPath editor.
This returns the same result as selecting a specific last_name field and then choosing Selection Option < Select only this last_name element, and is shown here to provide a simple example of using the XPath Editor.
An XPath expression is used to select objects based on specified criteria.
Example 1: select objects by matching a specific value
Select the last names of players on team Winnipeg.
- Select the last_name object. /player/last_name is displayed in the XPath Editor. This will return all player/last_name objects.
- Edit the XPath: click XPath to open the XPath editor and
insert the expression, [team/name='Winnipeg'],
after /player to select only the last names of players on team Winnipeg. The resulting XPath:
/player[team/name='Winnipeg']/last_namesays: for all player objects with descendant team/name equal to Winnipeg, select the last_name object.
- Click Simple o close the XPath editor.
Example 2: select objects based on field name
Select all objects whose name ends in "name".
This example uses the substring (string, start, length) function which returns the characters from string beginning at the start position and ending at (start+length) position. If length is not specified, substring will return the characters from the start position to the end of string For example, substring ('Klipfolio', 1, 4 ) returns 'Klip'.
- Select any child object of player, such as first_name.
- Edit the XPath: click XPath to open the XPath editor and type:
This XPath says: for all /player objects, select all child nodes whose names end in name, that is first_name, last_name, nickname. This expression returns Anders, Hedberg, Maurice, Richard, Rocket, Wayne, Gretzky, The Great One.
- Click Simple to close the XPath editor.
Example 3: select all objects in an array
/player/* selects all immediate descendant objects in the “player” array.Anders
The Great One
/player/* selects all immediate descendant objects (denoted by /*) in the second element of the "player" array.Maurice
/player/*/* selects all second level descendant objects (denoted by /*/*) in the third element of the "player" arrayEdmonton
Example 4: select JSON objects whose names begin with non-alphabetic/non-numeric characters
JSON objects with names that begin with non-alphabetic/non-numeric characters (as shown in the following data source) cannot be selected directly. Instead it is necessary to manually type the path in the XPath editor.
To select the @date object in the following data source, edit the XPath and type:
Similarly, to select the @value object in the following data source, edit the XPath and type:
To select all objects named @date in the first element of the "day" array:
Example 5: select JSON objects whose names begin with numeric characters
JSON objects with names that begin with numeric characters cannot be selected directly. Instead it is necessary to manually edit the path in the XPath editor and use the starts-with XPath function. For example to select
XPath axes (child, descendant, following-sibling, preceding-sibling) indicate the position of an object relative to another object in the hierarchy. For example,
- first_name, last_name, nickname, teams/name, teams/wha and teams/nhl are descendants of player
- first_name, last_name and nickname are child nodes of player
- first_name, last_name, nickname and teams are siblings of each other, preceding-sibling or following-sibling depending on their relative positions:
- first_name is a preceding-sibling of last_name
- last_name is a following-sibling of first_name
- Unnamed fields cannot be selected directly; see kf:element_at in XPath Functions.
- Field names that begin with a non-alphabetic character cannot be selected directly; see Examples 4 and 5 in XPath Expressions.
To select first_names for all players with a nickname descendant:
Similarly, to select all first_names with a nickname following-sibling (note this will return the same values as /player[descendant::nickname]/first_name):
To select all last_names of players who played in the WHA:
Sometimes more complex XPath manipulation is required to access and align JSON/XML data.The following table summarizes the XPath functions available for modifying an object’s XPath. Note that not all functions are supported for both JSON and XML formats.
Syntax: kf:element_at( object, index )
Select an element at the position specified by index. This function is used when a field name is unnamed and is instead referenced by number.
For example, using the following data source:
This function is used to “fill in” blanks for an element that is not included in every record. This is useful for data alignment.
/player/first-name returns Anders, Maurice, Wayne
/player/nickname returns Rocket, The Great One
As is, the data incorrectly aligns Anders with the nickname, Rocket, and Maurice with the nickname, The Great One. Use kf:fill_elements to align this data correctly:
To fill in the /data/shares/count field for every item in the /data element, specify each level in the path as a 'fieldname' parameter.
Note that this function does not work for nested arrays.
Syntax: kf:names( object )
The kf:names function returns the names of all the fields contained in an object. This is useful when the field names identify categories such as demographics or countries.
For example, Facebook can return page fans data by gender and age. To create a list of these categories for a table column or bar/line chart x-axis, use:
which returns data shown in the following image.
Compare with name(/data/values/value/M.25-34) which returns only value: M.25-34
The following show more examples using the datasource featured at the top of this article
kf:names(/player/team) returns name, wha
kf:names(/player/team) returns names for all items in the player array:
The DATASOURCE and MAP functions can be used together for more flexible XPath editing.
Example 1: select objects matching a set of values
In the XPath Expressions section, Example 1 selected objects by matching a specific value, where
selected the last_name for all /player objects with descendant team/name equal to Winnipeg. To select last_name objects matching multiple values, this formula can be extended using the MAP function and the DATASOURCE function.
- Convert the XPath reference, /player[team/name='Winnipeg']/last_name, to text and replace the hard-coded value, Winnipeg, with a variable, city
Wrap the DATASOURCE function around this to specify which data source (dsID) to apply this reference to:
- Then wrap the MAP function to repeat the XPath selection over a set of values (Winnipeg, Montreal, Toronto):
CONCAT(/player[team/name=' city, ']/last\_name)
If city= Winnipeg, this formula returns /player[team/name='Winnipeg']/last_name.
If city = Montreal, this formula returns /player[team/name='Montreal']/last_name.
CONCAT(/player[team/name=', city, ']/last\_name)) )
MAP( ARRAY( Winnipeg, Montreal, Toronto),
DATASOURCE( dsID,JOIN (CONCAT(/player[team/name=', city, ']/last\_name)) ) )
Note that because the MAP function can return only 1 value for each item in the first parameter, it is necessary to wrap the JOIN function around the CONCAT to convert a list of values to a single, comma-separated value.