File (.doc) to Structured Content

Case

Assume a Word document (.docx) is stored as a media file in a source system. That document has all the details on how to populate a content type in the destination system.

While this media asset exists at the source as a document file, the destination is actually a structured content type with fields that need to be populated based on values in the source document.

We need to be able to

  1. Parse the binary document into separate fields

  2. Map parsed fields into destination content type

Solution

ImpulseSync can solve this with a single job and couple manipulators. We will use both the file-to-text and liquid-field manipulators.

First we create a job to pick up the source media asset and sync it to the destination content type.

Next we set up the file-to-text manipulator.

This manipulator is configured to take the binary of the asset and convert it into text. That text will be stored in a new field called binaryAsText field. This new binaryAsText field can be referenced later by other manipulators or the content mapper.

The next manipulator to configure is the liquid-field manipulator

This manipulator is configured to create a new field called field1 using a liquid template as the value.

liquid field value 1:

{{ content.en.fields.binaryAsText.value[1] | section: 'Booking code: ', 'Voyage name:  DO NOT CHANGE' }}

The liquid template uses the previously created binaryAsText field's value and the section filter to get a specific section of the document. In this case the section between Booking code: and Voyage name: DO NOT CHANGE will be set as the value for the new field field1

Similarly, 2 more fields are created using the liquid-field manipulator.

liquid field value 2:

{{ content.en.fields.binaryAsText.value[1] | section: 'Voyage name:  DO NOT CHANGE', 'Link to images for approval: ' }}

liquid field value 3:

{{ content.en.fields.binaryAsText.value[1] | section: 'VOYAGE DETAILS:', 'Short description: ' }}

In total, the job now has 4 manipulators. 1 file-to-text and 3 liquid-field manipulators

Finally, we configure the content mapper to map our newly created fields into the correct fields for the destination content type.

Mapping it as follows:

  • field1 -> bookingCode

  • field2 -> voyageName

  • field3 -> voyageDetails

Now we can run this job and view the synced content.

Last updated