Efficient Data Import in PostgreSQL with Node.js
When working with large datasets, efficiently importing millions of records into a database can be daunting.

This article walks you through a performant approach using Node.js, PostgreSQL, and CSV files, leveraging streaming. The goal is to efficiently insert one million records into a PostgreSQL database while managing resource usage.
For people in a hurry to see the code, you can see it here: https://github.com/vdelacou/postgresql-bulk-data-generator
Key Concepts
- Streaming: Using streams for reading and writing large files incrementally, instead of loading them fully into memory.
Why This Approach?
This approach uses Node.js’s asynchronous nature combined with PostgreSQL’s COPY
command for bulk data import. The key benefits include:
- Performance: The
COPY
command is optimized for bulk inserts. - Memory Efficiency: Data is streamed, avoiding excessive memory usage.
Importing CSV Data
The COPY
command efficiently loads data from the CSV file into the specified table in the database. Below is the implementation using an efficient streaming pipeline:
const copyCsvToTable = async (client: PoolClient): Promise<void> => {
const fileStream = createReadStream('./data.csv');
const pgStream = client.query(from(`COPY consumers (name,email) FROM STDIN WITH (FORMAT csv, HEADER true)`));
try {
await pipeline(fileStream, pgStream);
logger.info('CSV data copy completed successfully');
} catch (error) {
logger.error('Error during copy:', error);
throw error;
}
};
Detailed Steps
- Open File Stream: Use the
createReadStream
function to open a read stream for the CSV file. - Set Up PostgreSQL Stream: Use the
COPY
command to prepare PostgreSQL to receive the data. Thefrom
method from thepg-copy-streams
library enables streaming capability. - Stream Data: Use the
pipeline
utility to connect the file stream to the PostgreSQL stream. This streams the data from the file directly into the database.
Benefits of Streaming with COPY
- Efficiency: The
COPY
command processes the file as a continuous stream, significantly reducing overhead compared to inserting rows individually. - Error Handling: The
try-catch
block ensures any errors during the streaming process are logged and rethrown for proper handling.
Extra: Generating a fake CSV File
The first step is to create a CSV file containing mock data. This file will act as the data source for importing into PostgreSQL.
const generateRow = (): [string, string] => [
faker.person.fullName(),
faker.internet.email(),
];
const generateCsvFile = async (): Promise<void> => {
const csvFile = Bun.file('./data.csv');
if (await csvFile.exists()) {
await unlink('./data.csv');
}
const writer = csvFile.writer();
writer.write('name,email\n'}
`);
for (let i = 0; i < config.itemsCount; i++) {
const row = generateRow();
writer.write(`${row.join(',')}\n`);
}
writer.end();
};
Detailed Steps:
- Check for Existing File: If the CSV file already exists, delete it to ensure a fresh start.
- Open a Writer Stream: Use a writer stream to incrementally write data to the CSV file.
- Generate Mock Data: Use the
faker
library to create realistic names and emails for each row. - Write Data to CSV: Write each row to the file, separating columns with commas.
- Close the Stream: Once all rows are written, close the writer to finalize the file.
Conclusion
This method efficiently handles importing one million records into PostgreSQL in 3 seconds!
With streaming and careful use of PostgreSQL features, you can easily handle large datasets. If you’re looking to scale your data workflows, this approach is a robust starting point.
Try It Out: Modify the code to fit your dataset and database. Happy coding!