Efficient Data Import in PostgreSQL with Node.js

Vincent Delacourt
3 min readJan 9, 2025

When working with large datasets, efficiently importing millions of records into a database can be daunting.

This article walks you through a performant approach using Node.js, PostgreSQL, and CSV files, leveraging streaming. The goal is to efficiently insert one million records into a PostgreSQL database while managing resource usage.

For people in a hurry to see the code, you can see it here: https://github.com/vdelacou/postgresql-bulk-data-generator

Key Concepts

  • Streaming: Using streams for reading and writing large files incrementally, instead of loading them fully into memory.

Why This Approach?

This approach uses Node.js’s asynchronous nature combined with PostgreSQL’s COPY command for bulk data import. The key benefits include:

  • Performance: The COPY command is optimized for bulk inserts.
  • Memory Efficiency: Data is streamed, avoiding excessive memory usage.

Importing CSV Data

The COPY command efficiently loads data from the CSV file into the specified table in the database. Below is the implementation using an efficient streaming pipeline:

const copyCsvToTable = async (client: PoolClient): Promise<void> => {
const fileStream = createReadStream('./data.csv');
const pgStream = client.query(from(`COPY consumers (name,email) FROM STDIN WITH (FORMAT csv, HEADER true)`));
try {
await pipeline(fileStream, pgStream);
logger.info('CSV data copy completed successfully');
} catch (error) {
logger.error('Error during copy:', error);
throw error;
}
};

Detailed Steps

  1. Open File Stream: Use the createReadStream function to open a read stream for the CSV file.
  2. Set Up PostgreSQL Stream: Use the COPY command to prepare PostgreSQL to receive the data. The from method from the pg-copy-streams library enables streaming capability.
  3. Stream Data: Use the pipeline utility to connect the file stream to the PostgreSQL stream. This streams the data from the file directly into the database.

Benefits of Streaming with COPY

  • Efficiency: The COPY command processes the file as a continuous stream, significantly reducing overhead compared to inserting rows individually.
  • Error Handling: The try-catch block ensures any errors during the streaming process are logged and rethrown for proper handling.

Extra: Generating a fake CSV File

The first step is to create a CSV file containing mock data. This file will act as the data source for importing into PostgreSQL.

const generateRow = (): [string, string] => [
faker.person.fullName(),
faker.internet.email(),
];

const generateCsvFile = async (): Promise<void> => {
const csvFile = Bun.file('./data.csv');
if (await csvFile.exists()) {
await unlink('./data.csv');
}
const writer = csvFile.writer();
writer.write('name,email\n'}
`);
for (let i = 0; i < config.itemsCount; i++) {
const row = generateRow();
writer.write(`${row.join(',')}\n`);
}
writer.end();
};

Detailed Steps:

  1. Check for Existing File: If the CSV file already exists, delete it to ensure a fresh start.
  2. Open a Writer Stream: Use a writer stream to incrementally write data to the CSV file.
  3. Generate Mock Data: Use the faker library to create realistic names and emails for each row.
  4. Write Data to CSV: Write each row to the file, separating columns with commas.
  5. Close the Stream: Once all rows are written, close the writer to finalize the file.

Conclusion

This method efficiently handles importing one million records into PostgreSQL in 3 seconds!

With streaming and careful use of PostgreSQL features, you can easily handle large datasets. If you’re looking to scale your data workflows, this approach is a robust starting point.

Try It Out: Modify the code to fit your dataset and database. Happy coding!

https://github.com/vdelacou/postgresql-bulk-data-generator

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Vincent Delacourt
Vincent Delacourt

Written by Vincent Delacourt

Interesting in start-up or project development in the latest technologies for web and mobile apps

No responses yet

Write a response