Laravel Upserts: How to batch import data efficiently
Released on 13 October 2020, Laravel 8.10 came with the addition of the new
upsert() method on the database query builder (and therefore Eloquent based queries too). This allows you to perform "insert or update" queries far more efficiently - which will help when importing large batches of data.
TLDR: Why use it?, How does it work?, How does it perform? How do I use it?
A practical example of using upsert()
Up until now, the Laravel database query builder supported the insertion or updating of multiple rows (batches of data) at once via the
update() methods. There was also a lesser known
insertOrIgnore() method which would insert a batch of data, ignoring any errors (such as a unique key already existing).
We also had the
updateOrCreate() method on Eloquent models which given a set of attributes, updated an existing model, or if it didn't exist, it created it. However, this could only be used on a single row / model at a time - if we had 125,000 rows to insert or update, we'd have to call that Eloquent method 125,00 times. Behind the scenes, this actually executes two queries in the database - one to check for the existing entry and another to perform the
UPDATE - so that would mean running 250,000 queries. That could be slow.
So let's imagine we have a CSV file containing data for 125,000 products that we needed to import into our database as efficiently as possible. No problem, we can use the
insert() method on the query builder to insert multiple rows at the same time.
Fast forward to 4 weeks later, we have a new CSV file of product data - the details of the 125,000 products have been updated and 25,000 new products have been added. In our database we need to update the existing products and insert the new ones. To do this, we need to know which products we already have, and which are new so we can perform either an
INSERT or an
Rather than using the
updateOrCreate() 150,000 times for each row in the CSV file, let's start by breaking down the CSV file into smaller batches, of say 5,000 products. That gives us 30 batches.
We can then pass each batch of product data to the the new
upsert() method on the
Illuminate\Database\Query\Builder class, which for each row of data in the CSV file will either update an existing product in our database, or create a new one. This way we will only execute 30 queries rather than 300,000 if we'd used the
updateOrCreate() method (remember, it's actually 2 queries for each use of it).
So how does upsert() work?
I'm going to assume you're using MySQL, although it does work with other database engines.
Following on from our example of a products database, simplified down for a batch of just 3 products, this is the SQL that would be run:
INSERT INTO products (item_ref, title, price) VALUES (1, 'Large cake', 15.00), (1, 'Medium cake', 10.00), (1, 'Small cake', 5.00) ON DUPLICATE KEY UPDATE item_ref = values(item_ref), title = values(title), price = values(price);
As you can see, the query starts with a normal
INSERT statement and it will attempt to insert the first set of data values into the table. If that's successful, it will remove on to the next set. However, if that fails because the unique identifier (in our example, that's "item_ref") already exists (i.e the row of data is already in the table), then it will be caught by the
ON DUPLCIATE KEY part of the query. The SQL following that, in this case an
UPDATE statement, will then get run. This will update the existing database row with the set of values. Once complete, it will move onto the next set of data values.
What about the performance of upsert()?
To test the performance of the new
upsert() method, I started a new Laravel project and created a simple artisan command that performs 3 tests - one using the
updateOrCreate method, one using the
upsert() method for singular rows, and one for using
upsert() method with batches of rows (5,000 at a time).
For each test, an array of 125,000 "products" are written to an empty table - measuring the time it takes to complete. Every item in the original array of product data is then updated with a new price, and a further 25,000 products added. This updated raw data is then also written to the database whilst being timed.
I spun up a new server on Linode (with PHP 7.4 & MySQL 8) and ran the artisan command 5 times. I averaged out the times from each test, which ended up as follows:
Test 1 (
- INSERT: 280.84 seconds
- INSERT / UPDATE: 398.76 seconds
Test 2 (
upsert() method - singular rows)
- INSERT: 203.99 seconds
- INSERT / UPDATE: 270.39 seconds
Test 3 (
upsert() method - batched rows)
- INSERT: 15.38 seconds
- INSERT / UPDATE: 17.97 seconds
As expected the first tests with
updateOrCreate() method was slowest. The second test, using
upsert() for each product one-at-a-time was a little quicker as it only executed 1 query for each product, rather than 2. Finally, the third test, using
upsert() for batches (5,000) of products was vastly quicker than the other two tests. In my tests, it works out at ~95% quicker for both
INSERT queries and
INSERT / UPDATE queries when compared to
That's great, how can I use upsert()?
The method is available of the Laravel database query builder, and the Eloquent query builder. It accepts 3 arguments:
upsert(array $values, $uniqueBy, $update = null)
The first argument is an array of arrays containing the data to be inserted / updated - one child array for each Eloquent model / database row. For example:
$values = [ [ 'item_ref' => 1, 'title' => 'Large cake', 'price' => 15.00, ], [ 'item_ref' => 2, 'title' => 'Medium cake', 'price' => 10.00, ], [ 'item_ref' => 3, 'title' => 'Small cake', 'price' => 5.00, ] ];
If you're using
upsert() on an Eloquent model that uses timestamps, the
updated_at attributes will automatically be handled.
The second argument describes how to identify a unique row.
This could be the name of a single column (passed as a string, or an array containing a single string), or if you have a composite key (two or more columns used to uniquely identify a row), the names of multiple columns passed as an array. For example:
$uniqueBy = 'item_ref'; // OR $uniqueBy = ['item_ref']; // OR $uniqueBy = ['item_type', 'item_size'];
All database engines except SQL Server require the columns in the $uniqueBy argument to have a "primary" or "unique" index.
The third and final argument, which is optional, defines the columns that should be updated if a matching row already exists in the database. This allows you to include data in your
$values argument that will be written only on an
INSERT. By default all columns in your
$values are updated, but here is an example if you do want to provide this argument:
$update = ['title', 'price'];
I hope you've found this useful - if you do want to know more you can read the docs (although I've covered everything and more in this blog post) or review the original PRs on GitHub: #34698, #34712.
I'm @gbuckingham89 on Twitter if you have any questions about using
upsert() or anything else Laravel related.
If you're looking to work with a Laravel developer, please do contact me.