File (.doc) to Structured Content
Last updated
Last updated
Assume a Word document (.docx) is stored as a media file in a source system. That document has all the details on how to populate a content type in the destination system.
While this media asset exists at the source as a document file, the destination is actually a structured content type with fields that need to be populated based on values in the source document.
We need to be able to
Parse the binary document into separate fields
Map parsed fields into destination content type
First we create a job to pick up the source media asset and sync it to the destination content type.
Next we set up the file-to-text manipulator.
This manipulator is configured to take the binary of the asset and convert it into text. That text will be stored in a new field called binaryAsText
field. This new binaryAsText
field can be referenced later by other manipulators or the content mapper.
The next manipulator to configure is the liquid-field manipulator
This manipulator is configured to create a new field called field1
using a liquid template as the value.
liquid field value 1:
Similarly, 2 more fields are created using the liquid-field manipulator.
liquid field value 2:
liquid field value 3:
In total, the job now has 4 manipulators. 1 file-to-text and 3 liquid-field manipulators
Finally, we configure the content mapper to map our newly created fields into the correct fields for the destination content type.
Mapping it as follows:
field1 -> bookingCode
field2 -> voyageName
field3 -> voyageDetails
Now we can run this job and view the synced content.
ImpulseSync can solve this with a single job and couple manipulators. We will use both the and .
The liquid template uses the previously created binaryAsText
field's value and the to get a specific section of the document. In this case the section between Booking code:
and Voyage name: DO NOT CHANGE
will be set as the value for the new field field1