ImpulseSync™ User Manual
HomePricingContact Us
  • Introduction
    • What is Impulse?
  • Crash Course of ImpulseSync
    • Overview Of ImpulseSync
    • Step 1: Endpoints
      • Endpoint Configuration
    • Step 2: Jobs
      • Job Configuration
      • Step 2a: Content manipulators
      • Step 2b: Content mapper
    • Step 3: Syncing
  • Getting Started
    • Core Concepts
    • Creating Endpoints
    • Creating Jobs
    • Starting a Transaction
    • Transaction Reports
    • Automating Jobs with Pipelines
    • Scripting Post Sync
    • Scheduling Jobs and Pipelines
    • Dashboard
    • Managing Jobs/Pipelines
    • Content Mapper
      • Aligning Mismatched Content
      • Connector Matrix
      • Locked Fields
      • Content Aligner
      • Aligning Content Challenges
  • Reports
    • Reports Screen
    • Debug Report
    • Messages
  • Connectors
    • Common Job Options
    • All Connectors List
    • Source Connectors
      • Contentful
      • Contentstack
      • dotCMS
      • Drupal v7
      • Drupal v9
      • GitHub
      • GraphQL
      • MS Teams
      • SCP
      • Snapshot
      • Strapi v3
      • Strapi v4
    • Destination Connectors
      • Contentful
      • Contentstack
      • dotCMS
      • SCP
      • Strapi v3
      • Strapi v4
  • Content Manipulators
    • Common Manipulator Options
    • Add Replace Field
    • AI(Artificial intelligence)
    • Change ID Manipulator
    • CSV Store Manipulator
    • Dynamic Job Store Manipulator
    • File to Text
    • Folder Manipulator
    • Get and Set Field
    • Language
    • Liquid Field
      • Liquid On the Quick
      • Basics
        • Impulse Values
        • Impulse Variables
        • Operators
        • Truthy and falsy
        • Types
        • Whitespace control
      • Tags
        • Control flow
        • Impulse Content Objects
        • Iteration
        • Utility
        • Variable
      • Filters
        • abs
        • append
        • capitalize
        • ceil
        • compact
        • concat
        • date
        • date_str
        • default
        • divided_by
        • downcase
        • escape
        • escape_once
        • first
        • floor
        • getStoredValue
        • htmlQuery
        • htmlReplace
        • idMap
        • join
        • jq
        • json
        • last
        • lstrip
        • map
        • minus
        • modulo
        • newline_to_br
        • plus
        • prepend
        • remove
        • remove_first
        • replace
        • replace_first
        • reverse
        • round
        • rstrip
        • section
        • sections
        • size
        • slice
        • sort
        • sort_natural
        • split
        • str_to_date
        • strip
        • strip_html
        • strip_newlines
        • times
        • truncate
        • truncatewords
        • type
        • uniq
        • upcase
        • utl_decode
        • url_encode
      • Liquid Playground
    • Markdown
    • Regex
    • Relationship
    • Store Field
    • Tidy
  • Time Machine
    • Snapshot
    • Viewing Snapshots
    • Delivery from Snapshots
  • Cookbook Recipes
    • Adding Fields
    • Aligning Content between Endpoints
    • Avoid overriding Fields
    • Avoid syncing Content Types
    • Combing Fields
    • Default Field Value
    • File (.doc) to Structured Content
    • File (.docx) to Structured Content - Expanded
    • HTML to Structured Content
    • Language (Locale) mismatch between endpoints
    • Paths/IDs Changed
    • Reference to Value
    • Single Content Type to Multiple
    • Splitting Content with Reference
    • Syncing Content with Languages
    • Text Select to Boolean
    • Text to Reference
    • Text to Reference - liquid
    • Two Sources to One Destination
    • Changing a folder path
    • Combining data between content types
    • Converting HTML Sections
    • JSON object to reference
    • Use CSV to convert values
    • Storing fields with Store field motator
  • Troubleshooting
    • What to do if I run into a Job Problem
    • Troubleshooting via UI
    • Submitting a ticket
  • Using Impulse Headlessly
    • Getting Started with cURL
      • Creating Endpoints
      • Creating Jobs
      • Starting a Transaction
      • Transaction Reports
      • Automating Jobs with Pipelines
      • Scheduling Jobs and Pipelines
      • Aligning Mismatched Content
      • Scripting Post Sync
  • Organization Tier Restrictions
  • Content Storage Options
Powered by GitBook
On this page
  • Case
  • Solution
  1. Cookbook Recipes

HTML to Structured Content

PreviousFile (.docx) to Structured Content - ExpandedNextLanguage (Locale) mismatch between endpoints

Last updated 1 year ago

Case

Assume we want to move an HTML webpage into a structured content type. One of the fields is a reference to an author content. In this example the HTML is taken from the following article https://www.espn.com/college-football/story/_/id/38491515/what-relegation-college-football-look-like.

Section of Source HTML

Destination Content Type

So we want to take the source HTML and move it into a destination content type.

We need to be able to:

  1. Parse HTML into multiple fields

  2. Create a reference to the article author

Solution

The first job is the "Align Author" job.

Because we want to use this job to align, we need to set what fields to align in the mapping.

The aligner is set to align the source field author_name to the destination field title. When these two fields exactly match, then ImpulseSync will align the contents and create an ID map between them for later transactions to use.

When creating a job only for aligning, we can use the Nodeliver job option on the destination endpoint to not deliver any content. However, the job will still align content and create ID maps accordingly.

This job will also use a liquid field manipulator.

The config for this manipulator will create a new field author_name and set the value based on a liquid template.

The template is set to query and parse the source HTML value for a tag with the classes .author and .has-bio. It will return the text value of the data parsed. It will also remove any tag with the class .timestamp That value is then split by , and the first index of the array will be set as the value from the template.

The second job is the "HTML Sync" job. When this job runs it will take the source HTML value, parse it, and set it into the structured content at the destination.

This job has 3 liquid manipulators configured and 1 relationship manipulator.

The first liquid manipulator creates the field html_title.

The config for this manipulator uses a liquid template to parse the source HTML for a tag with the class .article-header. This value is then set for the html_title field.

The second liquid manipulator creates the field date.

The config for this manipulator uses a liquid template to parse the source HTML for a tag with the class .timestamp in a tag with the class .author-has-bio. This value is then set for the date field.

The third liquid manipulator creates the field body.

The config for this manipulator uses a liquid template to parse the source HTML for a tag with the class .article-body and returns every p tag in that .article-body. It will return the OuterHTML value of the data parsed, leaving the HTML found in tact, rather than remove any or all of the tags. This value is then set for the body field.

The relationship manipulator is used to create the field author.

The manipulator is configured to create a relationship between the content and the author it was previously aligned with. (The author the content has an ID map to)

Once both these jobs are run the end result is a structured content with fields populated based on data parsed from the source HTML.

ImpulseSync can solve this with a two jobs using the and manipulators. The first job will align the article to the destination author content, creating an ID map in Impulse. The second job will sync the HTML and parse it into multiple fields.

liquid-field
relationship