Migration XML Source
2019 13 Jun

Content migration in Drupal from XML Source

 

 

 

One of our recent posts talks about migration API in Drupal 8, the basics of migration and migrating data for various field types from a CSV source. Some frameworks and CMS such as Wordpress, etc allow the data to be exported in XML format (or JSON format if certain extensions are available). And more often than not, we find another agency providing us with the XML data of the site which needs to be migrated to Drupal. In this post, we see how content can be migrated to Drupal from an XML source.

Note: It is not a pre-requisite but basic knowledge of XPath selectors can be very helpful.

The major requirements/dependencies that need to be fulfilled are as follows:

  • Custom module - It contains scripts that will be imported when the module is installed and must have these dependencies: Migrate, Migrate Plus, Migrate Tools.
    The main objective for creating a custom module is that when content is successfully migrated and requires no further updates, removing the custom module will remove the migration scripts that were imported without affecting any other workflow on the site.
  • Source XML file - This file can be external (must be accessible over HTTP) or locally stored (in private file directory).

 

Writing migrating scripts:

Assuming we are acquainted with basic elements in a migration script such as id, label, migration group, etc., we move on to the most important components of the migration scripts:
Source ~ Process ~ Destination

Let’s have a look at the important parameters that need to be defined.

  • Source

    • url - Source plugin used.
    • data_fetcher_plugin - Defines how to retrieve the source data, either a general URL/local file or over an HTTP connection.
    • data_parser_plugin - Defines the format for parsing source data such as JSON, XML, soap.
    • urls - URL to file or storage path to file with a stream wrapper (can be multiple).
    • item_selector - Data is parsed as nodes in XML using XPath selectors. This property identifies the individual item to be migrated.
    • fields - Under this parameter, we map the fields to machine names that can be used in the process part of the migration. Each field will have three keys linked to itself: name, label, selector.
      • ‘name’ - unique name to identify the field in other parts of the migration.
      • ‘label’ - describes the type of data.
      • ‘selector’ - the XPath selector relative to the path defined in item_selector to extract the data for the field from the source file.
    • ids - Defines the unique key to be used for mapping in migration tables.
  • Process

    This part has a general template for any migration with any source plugin. The values mapped under the ‘fields’ parameter in the source plugin is manipulated using various process plugins and then assigned to fields of the entities being migrated. We can use the result of one process plugin as the input for another through chaining.
  • Destination

    Destination plugin defines the target entity to be created using the resultant data from the process plugin.

Here’s an illustration that depicts the process of migration of files.

For creating a file entity in Drupal following attributes are required: filename, uri, uid, status.
We can define default values for uid and status but filename and uri must be unique so these values must be extracted from the source XML file while migrating files.

The sample given below contains data for files related to books. Each <item> node represents a file and has a title, location and unique ID associated with it.

<books>
  <item>
    <title>In Search of Lost Time</title>
    <fid>1</fid>
    <link><![CDATA[https://www.demo.com/images/books/cover-image/search-lost-time.png]]></link>
  </item>
  <item>
    <title>The Lost Symbol</title>
    <fid>2</fid>
    <link><![CDATA[https://www.demo.com/images/books/cover-image/lost-symbol.jpg]]></link>
  </item>
  <item>
    <title>The Alchemist</title>
    <fid>3</fid>
    <link><![CDATA[https://www.demo.com/images/books/cover-image/alchemist.png]]></link>
  </item>
</books>

Sample XML data

Migration template:

  1. SOURCE:

    source:
      # We use the XML data parser plugin.
      plugin: url
      data_fetcher_plugin: http
      data_parser_plugin: xml
      urls: 'private://books/files.xml'
      # The XPath to use to query the desired elements.
      item_selector: /books/item
      # Under 'fields', we list the data items to be imported. The first level keys 
      # are the source field names we want to populate (the names to be used as 
      # sources in the process configuration below. For each field, we're importing,
      # we provide a label (optional - this is for display in migration tools) and
      # an XPath for retrieving that value. This XPath is relative to the elements 
      # retrieved by item_selector.
      fields:
        -
          name: fid
          label: 'File ID'
          selector: fid
        -
          name: url
          label: 'File Link'
          selector: link
      # Under 'ids', we identify source fields populated above which will uniquely
      # identify each imported item. The 'type' makes sure the migration map table 
      # uses the proper schema type for stored the IDs.
      ids:
        fid:
          type: integer
      # Constants can be defined
      constants:
        file_dest_uri: 'public://books/images'
    ...
    • Source plugin used is ‘url’.
    • Using ‘http’ as ‘data_fetcher_plugin’ facilitates the use of request headers, authentication and flexible storage of the source file.
    • Since the data to be parsed is in XML format, the data_parser_plugin used is ‘xml’.
    • The source file used is ‘files.xml’ which is stored locally in the ‘private’ directory so we can use the private stream wrapper to access the file.
    • Each <item> node in the sample data maps to one file entity to be migrated so we define these individual nodes in the item_selector for which the XPath notation is ‘/books/item’.
    • The destination file location can be defined by us so we only need the source file location to download and save the file, and a unique ID to identify the mapping in the migration tables for lookups. These have been defined under the ‘fields’ parameter.
        <item>
          <title>In Search of Lost Time</title>
          <fid>1</fid>
          <link><![CDATA[https://www.demo.com/images/books/cover-image/search-lost-time.png]]></link>
        </item>
        
        <item>
          <title>The Lost Symbol</title>
          <fid>2</fid>
          <link><![CDATA[https://www.demo.com/images/books/cover-image/lost-symbol.jpg]]></link>
        </item>
      

      Each parent node ‘/book/item’ represents one file entity and the source file location can be obtained from ‘link’ node.
    • Since the ‘fid’ field is unique and can be used as source ID for mapping in the migration tables, we specify it under ‘ids’ parameter.
  2. PROCESS:

    ...
    
    process:
      # Assign 'url' value to a temporary variable.
      file_source: url
      # Using the 'explode' plugin and '/' as delimiter on file URL, we obtain 
      # an array with the file name as the last element and using 'array_pop' plugin
      # we get the file name with extension.
      temp_name:
        -
          plugin: explode
          source: '@file_source'
          delimiter: /
        -
          plugin: array_pop
      # Using 'concat' plugin with uri (defined under 'constants') and filename,
      # we get the destination file path. The 'urlencode' plugin is used to form 
      # a valid destination URL for the file.
      file_destination:
        -
          plugin: concat
          delimiter: /
          source:
            - constants/file_dest_uri
            - '@temp_name'
        -
          plugin: urlencode
      # Assign temp_name value to 'filename' attribute.
      filename:
        plugin: default_value
        default_value: '@temp_name'
      # Copy the file from the source location and add it to the destination 
      # using 'file_copy' plugin. If a file already exists, we can specify if 
      # the file should be replaced by the new file or the new file should 
      # be renamed and then copied.
      uri:
        plugin: file_copy
        source:
          - '@file_source'
          - '@file_destination'
        file_exists: replace
      uid:
        plugin: default_value
        default_value: 1
      status:
        plugin: default_value
        default_value: 0
    
    ...
    • Filenames are not distinctly specified in the source XML so we derive it for the ‘filename’ from the URL given for the source file. The ‘explode’ and ‘array_pop’ plugins implement the PHP explode and array_pop functions on the source data.
    • The ‘concat’ plugin joins the specified URI and file name to create a complete path for the file destination. This path is then validated and encoded using the ‘urlencode’ plugin.
    • ‘file_copy’ plugin copies the file from the source to destination. We can specify the operation to perform in case the file already exists with the ‘file_exists’ parameter of the plugin. The existing file can be replaced or renamed as per the requirement.
  3. DESTINATION:

    ...
    
    destination:
      plugin: entity:file
    migration_dependencies: {}
    # Under this, we define module dependencies. This ensures that the migration 
    # configuration will be removed once the module is uninstalled when 
    # migration is successfully completed.
    dependencies:
      enforced:
        module:
          - custom_migration
    
    • Since our target entity to be created is of type file, we specify the value ‘entity:file’ as the destination plugin.

This template represents a basic configuration that can be used for migration of files from an XML source. We can write migration templates for other entities following the same strategy as illustrated. First, we identify the key fields for which the data is to be extracted and define XPath selectors to obtain the values in the source section. Second, we operate on these values using one or more process plugins and map the results to the entity fields. And third, we define the entity type to be created after the migration.

If you have any query or suggestion related to this post, please let us know through your comments...

Featured blog

web-personalization

Personalized Content is a Proven Entity !!

Irrespective of how big a business icon or brand you are, increasing the relevance of your website will always be critical to your success.

Read More

Git Hooks

Git hooks for better codes

We are programmers and we are always on the lookout for ways to improve our code. A good and structured way of coding defines the completeness of a programmer.

Read More

Drupal ,varnish cache

Hard time with Drupal, Varnish Cache and Cookies

Using a reverse proxy server in front of a web server is usually needed for every big site and it is a very good thing to do so as reverse proxy server will handle all the anonymous traff

Read More

Say no to captcha

Say no to captcha - Various Spam Protection Methods

Maintaining high traffic websites have their own merits and demerits, the most annoying thing about them is SPAM.

Read More