File (.docx) to Structured Content - Expanded

Case

This case is an expansion of the the basic File to Structured Content recipe using the Text to Reference - Liquid recipe.

As such, we assume a Word document (.docx) is stored as a media file in a source system. That document has all the details on how to populate a content type in the destination system.

While this media asset exists at the source as a document file, the destination is actually a structured content type with fields that need to be populated based on values in the source document.

We need to be able to

  1. Parse the binary document into separate fields

  2. Map parsed fields into destination content type

    1. field types include number, various text fields, and reference/relationship fields.

Solution

ImpulseSync can solve this with a set of jobs and manipulators. We will use both the file-to-text store-field, and liquid-field manipulators (using the getStoredValue filter).

Config pre-requisite jobs

Because we have relationship fields to be built from the document, we need to set up a couple pre-requisite jobs that will store the IDs of the content to be referenced. In this example we have 2 pre-req jobs.

The destination system already has the dependent content created. So we need ImpulseSync to pickup and store the IDs of those contents to later create a relationship to them. These pre-req jobs will do just that.

Both jobs will only have a source endpoint configured. Since we're only interested in picking up content to store it's data, we do not need a destination endpoint to deliver to.

Both jobs will use the store-field manipulator to store the content ID using a unique identifier of the content as the key. In this case the key will be the name of the content.

The config to store the content ID of the ships.

And this is the config to store the content ID of the itinerary days.

Notice how the Applyon field is set to "Write". This is because we have no destination endpoint to deliver to. So we want to apply these manipulators immediately after the content is picked up, rather than before the content is delivered.

Additionally, we are using the optional config parameters Storekey and Storevalue for the manipulators. These parameters allow us to use liquid templates to generate a specific key and value for storage. The Storekey value will be used in the last job to retrieve the stored value.

Configure primary job

The last job to create is the job which will deliver the file as a structured content.

This job does have a destination endpoint since we want to actually deliver the file as a content to the destination system.

This job will also have multiple manipulators.

In this case we have 10 manipulators being used, however more or less manipulators can be used depending on the format of the doc and the destination content type.

Because we're syncing a file to a content, we must first run the file-to-text manipulator. This will convert the file into a text field which can then be manipulated appropriately.

Any manipulator which will manipulate the converted text field must have an order value greater than the file-to-text manipulators' order value. In this case they will all have an order of 1 or greater.

Each liquid-field manipulator used to create a new field will use the new field binaryAsText to parse the text of the doc.

This recipe won't cover all the different liquid-field manipulators configured, however it will show an example for each of the different field types to be created.

Using a liquid-field manipulator we can create a String field.

Simliarly, we can create a WYSIWYG field.

We can also create a decimal (or any other supported number field).

We can also create a relationship field to relate ships using ship content IDs we previously stored.

Notice the use of the liquid filter getStoredValue. This will use the storeKey value to retrieve whatever value is stored. We then strip any additional whitespace that may be added and set it as the childId value.

Also notice the Impulsefieldvalue config param. This is being used to create a relationship field specifcally using a liquid template. More details can be found in the liquid documentation pages.

In a simliar manner, we can create an array of referenced contents.

This liquid template again uses the Impulsefieldvalue option and the getStoredValue liquid filter. It also uses a for loop and if statements in the liquid template to build a proper Motation relationship field.

Config pipeline

Because these jobs make use of storage with the store-field manipulator and the getStoredValue liquid filter, we need to make sure these values are stored for long enough. If you pay for storage space, you can set the Storecontent job option on the endpoints of these jobs to permanently store the values.

However, we'll assume you do not want to permanently store these values. Instead you will need to create a pipeline using all 3 of these jobs.

This pipeline consists of 2 steps.

The first step runs both pre-requisite jobs to store the content ID data.

The second step runs the job which will sync the doc into a structured content. This job will use the data stored from the pre-req jobs in the previous step. And since these jobs are all in a pipeline without the Storecontent job option set, the stored data is temporary and will be deleted once the pipeline has finished running.

Review synced content

Now we can run this pipeline and view the synced content.

Last updated