Skip to main content

Bulk uploading using the `bulk_upsert` API

The bulk_upsert API operation enables you to update multiple content posts at once.

To do this, you have to import data into the Content module in CSV format. The uploaded CSV file is subject to the following conditions:

  1. It has to be uploaded manually to KurocoFiles.
  2. The uploading process is batch executed at midnight (0:00) every day.
  3. A new uploading process will be initiated if you update any files between 0:00 the previous day and 0:00 today.
  4. If there is existing content, it will be updated by the uploaded data. Otherwise, new content will be created.
  5. The uploaded CSV file will be encoded in UTF-8.

CSV file

For this tutorial, we will upload the following list of mobile devices.

item_numberitem_namecategorydescriptionstatusitem_colorrelease_dateis_public
00000001SmartPhoneSPSmartphone1black,white,red,blue2020/12/10TRUE
00000002SmartPhone LiteSPpopular edition Smartphone1black,white2021/12/10TRUE
00000003TabletTBTablet0silver2020/1/15FALSE
00000004Tablet 2TBTablet (2nd gen)1silver2022/1/15TRUE

Below is an explanation of each column in this CSV file.

ColumnLabelInput typeValue
item_numberItem codeNumberA unique 8-digit number for each item.
item_nameItem nameTextName or type of the item.
categoryItem categoryText
  • SP: Smartphone
  • TB: Tablet
descriptionItem descriptionTextA description or comment about the item.
statusAvailabilityNumber
  • 0: Out of stock
  • 1: Available
item_colorColor(s)TextColor of the item. Separate multiple values for the same item with commas.
release_dateRelease dateDate (yyyy/mm/dd)Item release date.
is_publicPublic statusText
  • TRUE: Public
  • FALSE: Private

Content structure setup

To import each row of the CSV file as a separate content post, we need to create a content structure containing fields that match the columns of the uploaded data.

If a corresponding default field exists, use the default field.
Otherwise, create an additional field and set the CSV column name as the identifier.

1. Create a content structure

Start a new content structure.
In the default part of the editor, enter the following:

FieldValue
NameMobile devices
Do not store update historySelect "Enable".

Image from Gyazo

Image from Gyazo

By default, the content structure contains the fields below.

Field nameIdentifier (CSV column name)API endpoint parameterDescription
ID/Slugitem_numberslugWhen sending requests to the bulk_upsert endpoint, slug needs to have a text value. Therefore, a prefix is automatically added to yield this input format: ITEM-%%item_number%%.
Daterelease_dateymd-
Categorycategorycontents_type-
Titleitem_namesubject-
Publish settingis_publicopen_flgA value of TRUE corresponds to a published content status, while FALSE corresponds to unpublished.

2. Add the corresponding fields

Next, set up the additional fields as follows:

Field IDField nameIdentifier (CSV column name)Format (field settings)Description
1Item codeitem_numberSingle-line textThis is the same CSV column that is mapped to "ID/Slug", but here the original numerical value is retrieved without the prefix.
2DescriptiondescriptionMulti-line text-
3AvailabilitystatusSingle choiceSet the following options:
  • 0::Out of stock
  • 1::Available
4Color(s)item_colorMultiple choiceSet the following options:
  • black::black
  • white::white
  • silver::silver
  • red::red
  • blue::blue
note

For the additional fields, the endpoint parameters will be the same as the custom-defined identifiers.

Image from Gyazo Image from Gyazo Image from Gyazo Image from Gyazo Image from Gyazo

When you are done, click [Add] at the bottom of the screen the save this content structure.

Image from Gyazo

3. Create categories

Finally, create categories to classify your mobile devices. "Category name" is the item category value, and "Field 01" is the key.

Category IDCategory nameField 01
(Auto-generated)SmartphoneSP
(Auto-generated)TabletTB

Image from Gyazo

Create an admin account to run the back-end processes

Now we need to create an account to run the back-end process. The member ID will be used in the batch process.

Create a new member

Open the Member editor and set up a new account as follows:

FieldValueDescription
NameSystem AdminEnter a value for back-end use only.
E-mail(your e-mail)Enter your e-mail address.
Login IDsystemEnter a value for back-end use only.
Password(any password)Set a strong, unique password.
GroupAdminSelect a group with superuser privileges. For this tutotial, we are using the default group "Admin" (group ID = 1).

Image from Gyazo

Image from Gyazo

Define the constants

You also need to add a new constant for the batch process.

FieldValue
NameSYSTEM_MEMBER_ID
Value(Enter the member ID of the "System Admin" account you created above.)

Image from Gyazo

API settings

Create a new API

Access the API screen and create a new API to set the endpoint for backend processing.

tip

Mixing endpoints with different purposes (for front-end, back-end, etc.) in the same API complicates authentication settings. Since this may increase security risks, it is recommended to separate the API settings by usage.

Image from Gyazo

Item nameValueDescription
titleInternal APISet a value that makes it easy to identify that it is for backend use.
version1.0
descriptionInternal API for Backend Process

After creating the API, click [Security], select [Dynamic Access Token], and save.

Image from Gyazo

Add the bulk_upsert endpoint

Click the [Add new endpoint] button to add the bulk_upsert endpoint to the API you just set up.

Image from Gyazo

Item nameValueDescription
Pathmobile_devices/bulk_upsert
Modelcategoryー: Content
Model: Topics (v1)
Operation: bulk_upsert
API request restrictionGroupAuth (Admin)Set a group that the member created previously at Create a new member belongs to.

For the basic settings and advanced settings, set the following values.

Image from Gyazo

Parameter nameValueDescription
topics_group_id14ID of the topics group to be updated. Set the ID numbered to the topics group "mobile device" created previously.
id_reference_allow_listslugSet items that can be specified as keys when updating content. Detail is explained below.
ignore_errorstrueIgnores rows with validation errors and add/update only valid content.

Regarding id_reference_allow_list parameter

When updating existing content with the bulk_upsert API, you need to specify the topics_id which is the key to identify the target content. Normally, we specify the number automatically assigned by Kuroco as follows.

{
"topics_id": 1,
"slug": "ITEM-00000001",
"subject": "Smartphone",
...
}

The problem here is how to identify the topics_id to be updated.

For other items, values can be converted from the CSV file, but topics_id is data held only by Kuroco. In order to identify this value, it is necessary to call the list API in advance to obtain the existing content. However, implementing this has the following problems:

  • Program gets complicated
  • Process time increases

id_reference_allow_list is a parameter prepared to solve the above. If it is set, you can add or update content using any item as a key instead of topics_id. For example, if slug is set like this time, the following requests can be specified.

{
"topics_id": "slug",
"slug": "ITEM-00000001",
"subject": "Updated Title",
...
}

If the above data is sent, if the content of slug = "ITEM-00000001" already exists, it will be updated, otherwise it will be newly added. This makes it possible to add content using only the ID of the original data without considering the topics_id numbered by Kuroco.

Implement batch process

Using the settings set up so far, we will implement batch process that executes daily import.

First, access the Batch editor screen and enter the following contents.

Image from Gyazo

Item nameValue
Titleupsert_mobile_devices
Identifierupsert_mobile_devices
TypeEvery day/00:00

After entering above, enter the following code in the Process field and click [Add].

{*
Pre-processing
*}
{* Authenticate using the member ID set in the constant *}
{login member_id=$smarty.const.SYSTEM_MEMBER_ID overwrite=true}

{* Check if CSV file is placed *}
{assign var='uploaded_csv_path' value='/files/ltd/bulk_upsert/mobile_devices.csv'}
{if !$uploaded_csv_path|rcms_file_exists}
{logger msg1='upsert_mobile_devices' msg2='CSV file is not found'}
{return}
{/if}
{* Check the update date and time of CSV file *}
{assign var='csv_updated_at' value=$uploaded_csv_path|rcms_file_mtime}
{if $csv_updated_at < '-1 day 0:00:00'|strtotime}
{logger msg1='upsert_mobile_devices' msg2='CSV file is not updated'}
{return}
{/if}

{* %% bulk_upsert %% *}

Next, replace the commented part {* %% bulk_upsert %% *} in the code above with the actual content update process.

First, access the Swagger UI screen and select the endpoint mobile_devices/bulk_upsert created previously.

Image from Gyazo

Click [Request body] -> [Schema] to see the definition of the request body accepted by the bulk_upsert endpoint.

Image from Gyazo

The bulk_upsert API accepts request bodies in the following two formats. This time, we will proceed with processing using the JSON format.

FormatRequest body
JSON{"list": [{...}]}
CSV file{"file": {...}, "encoding": "..."}

When updating in JSON format

You can see a detailed definition of each item by expanding the list property from the Request body schema.

Image from Gyazo

Comparing the above schema with the items we are updating, we can see that the request body should look something like this:

{
"list": [
{
"topics_id": "slug",
"slug": "ITEM-00000001",
"subject": "SmartPhone",
"contents_type": 78,
"open_flg": 1,
"ymd": "2020-12-10",
"item_number": "00000001",
"description": "Smartphone",
"status": "1",
"item_color": ["black", "white", "red", "blue"]
},
{
"topics_id": "slug",
"slug": "ITEM-00000001",
// ...
},
// ...
]
}

Now that we know the format of the request body to pass to the endpoint, we can implement batch process. Add the following code to the batch process created earlier and click the [Update] button.

{*
Initiate necessary constants
*}
{* json body ({"list": []}) passed to bulk_upsert endpoint *}
{assign_array var='bulk_upsert_body' values=''}
{assign_array var='bulk_upsert_body.list' values=''}

{assign_array var='csv_header' values=''}{* CSV header ([]) *}
{assign var='chunk_unit' value=1000}{* Unit for split upload *}

{*
Get the total number of rows of CSV to be processed in advance
*}
{assign var='last_index' value=-1}
{read_file name='uploaded_csv' row='csv_row' type='csv' path=$uploaded_csv_path}
{assign var='last_index' value=$last_index+1}
{logger msg1="last_index取得処理" msg2=$last_index}
{/read_file}

{*
Update process
*}
{* index indicating the row being processed *}
{assign var='i' value=0}
{* Read CSV file *}
{read_file name='uploaded_csv' row='csv_row' type='csv' path=$uploaded_csv_path}
{if !$csv_row|@is_array}
{logger msg1='upsert_mobile_devices' msg2='Invalid csv row' msg3=$csv_row}
{elseif $i === 0}
{* Get CSV header *}
{assign var='csv_header' value=$csv_row}
{$csv_header|@rcms_json_encode}
{logger msg1="CGet CSV header" msg2=$csv_row}
{else}
{logger msg1="Get CSV header"}
{* CSV行の変換 *}
{assign_array var='topics' values=''}
{assign var='topics.topics_id' value='slug'}{* add/update slug as key *}
{foreach from=$csv_row key='k' item='v'}
{assign var='col_name' value=$csv_header[$k]}{* Get CSV Item name *}
{if $col_name == 'item_number'}
{* Product no. *}
{assign var='topics.slug' value="ITEM-`$v`"}
{assign var='topics.item_number' value=$v}
{elseif $col_name == 'item_name'}
{* Item name *}
{assign var='topics.subject' value=$v}
{elseif $col_name == 'category'}
{* category *}
{if $v == 'SP'}
{assign var='topics.contents_type' value=78}
{elseif $v == 'TB'}
{assign var='topics.contents_type' value=79}
{/if}
{elseif $col_name == 'item_color'}
{* color *}
{assign var='topics.item_color' value=','|explode:$v}
{elseif $col_name == 'release_date'}
{* release date *}
{strtodate var='topics.ymd' format='Y-m-d' timestamp=$v}
{elseif $col_name == 'is_public'}
{* public/private *}
{if $v == 'TRUE'}
{assign var='topics.open_flg' value=1}
{else}
{assign var='topics.open_flg' value=0}
{/if}
{else}
{* Others *}
{assign var="topics.`$col_name`" value=$v}
{/if}
{/foreach}
{* Add to JSON body ({"list": [..., {...}]}) *}
{assign var='bulk_upsert_body.list.' value=$topics}
{/if}
{* Update by dividing for each number defined in $chunk_unit *}
{if $bulk_upsert_body|@count === $chunk_unit ||
($i === $last_index && $bulk_upsert_body|@count > 0)}
{* Request to bulk_upsert endpoint (Add _async=true parameter and execute in batch process) *}
{api_internal
var='bulk_upsert_response'
status_var='bulk_upsert_status'
endpoint='/rcms-api/8/mobile_devices/bulk_upsert?_async=true'
method='POST'
queries=$bulk_upsert_body
member_id=$smarty.session.member_id}
{* Output to log if failed *}
{if !$bulk_upsert_status || $bulk_upsert_response.errors}
{logger msg1='upsert_mobile_devices' msg2='Request failed' msg3="index: `$i`" msg4=$bulk_upsert_response}
{/if}
{* JSON bodyの初期化 ({"list": []}) *}
{assign_array var='bulk_upsert_body.list' values=''}
{/if}
{logger msg1=$i msg2=$last_index msg3=$topics msg4=$csv_row}
{assign var='i' value=$i+1}
{/read_file}
caution

Replace /rcms-api/8/mobile_devices/bulk_upsert with your own endpoint.
Use the category ID in your environment for {assign var='topics.contents_type' value=78}{assign var='topics.contents_type' value=79}.

We supplement the details of the processing.

About reading CSV files

read_file is a plugin for reading text data line by line. It can be used to read CSV files by specifying csv for the type parameter.
The character code that can be read by read_file is UTF-8 only.

{read_file name='uploaded_csv' row='csv_row' type='csv' path=$uploaded_csv_path}
{* ... *}
{/read_file}

The CSV row data is assigned to the variable name $csv_row specified by the row parameter. You can check the contents of the data by writing the following process inside the {read_file}{/read_file} block.

{$csv_row|@rcms_json_encode}

The following array data is output. The request body is generated by transforming these values based on the field definition.

["00000001", "SmartPhone", "SP", "Smartphone", "1", "black,white,red,blue", "2020/12/10", "TRUE"]

About adding/updating contents

The api_internal plugin is used to call the bulk_upsert endpoint for adding/updating content.
Please refer to Can I call Kuroco's API from custom function? for more information on how to use it.

{api_internal
var='bulk_upsert_response'
status_var='bulk_upsert_status'
endpoint='/rcms-api/23/mobile_devices/bulk_upsert?_async=true'
method='POST'
queries=$bulk_upsert_body
member_id=$smarty.session.member_id}

The endpoint path has the _async=true parameter to execute API processing asynchronously.

Calls to endpoints are typically synchronous. After submitting the request, you need to wait for the processing to complete. However, since the bulk_upsert API handles a large amount of content at once, depending on the number of CSV data, it may take some time to complete the process, resulting in a timeout.

If you use the _async=true parameter, it will only register the batch process and return the response immediately without executing the addition/update process at the time of the request. The processing of the called API is executed in a separate process from the caller, so timeout issues can be avoided. Specify this when the number of data to be updated is large.

Check the operation

Now that the settings have been completed, let's check the operation.
First, place the CSV file (mobile_devices.csv) in the directory (/ltd/bulk_upsert) specified by batch processing.   Since the KurocoFiles (Private) folder will be ltd, create a bulk_upsert folder under it and set the CSV file.

Image from Gyazo

Next, access the editor screen created previously and click [Run now].
Image from Gyazo

If you check the contents list of "mobile device", you can see that the contents are registered from CSV.

Image from Gyazo

Confirmation of operation is completed.


Support

If you have any other questions, please contact us or check out Our Slack Community.