Mombu the Php Forum sponsored links

Go Back   Mombu the Php Forum > Php > a better way to do a data import?
User Name
Password
REGISTER NOW! Mark Forums Read

sponsored links


Reply
 
1 13th April 18:39
blackwaterdev
External User
 
Posts: 1
Default a better way to do a data import?



I have a text file that contains 200k rows. These rows are to be imported
into our database. The majority of them will already exists while a few are
new. Here are a few options I've tried:

I've had php cycle through the file row by row and if the row is there,
delete it and do a straight insert but that took a while.

Now I have php get the row from the text file and then to array_combine with
a default array I have in the class so I can have key value pairs. I then
take that generated array and do array_diff to the data array I pulled from
the db and I then have the columns that are different so I do an update on
only those columns for that specific row. This is slow and after about
180,000 rows, php throws a memory error. I'm resetting all my vars to NULL
at each iteration so am not sure what's up.


Anyone have a better way to do this? In MySQL, I could simply a replace on
each row...but not in postgres.

Thanks!
  Reply With Quote


  sponsored links


2 13th April 18:39
eric.butera
External User
 
Posts: 1
Default a better way to do a data import?



If it is a memory error perhaps you could just do a select of the
primary key in your existing recordset first. Then you could iterate
over each of the new records to be inserted and say if !isset(new
record pk) then insert. This way the only thing in memory is the
current primary keys and a single new record. This might not work
though.

If that doesn't work, perhaps you could know that your primary key is
auto incrementing. On each update just insert records that are
greater than the previous max value?

One possible other solution. I have one data set that I have to
update each night from an XML feed. I just rip through the file and
insert the records with a given timestamp. Then once I am complete I
delete all the previous records with a timestamp that is less than the
current insert. This is for a very specific thing though and might
not be right for your needs.

Good luck!
  Reply With Quote
3 13th April 18:39
robert
External User
 
Posts: 1
Default a better way to do a data import?


Does Postgres support any method of temporarily disabling keys/indexing?
Indexing is what causes the inserts to take a while. MySQL can optimize
an import by locking the table and allowing the keys/indexes to be
temporarily disabled. You'll see the following lines in recent MySQL
database dumps surrounding the inserts:

/*!40000 ALTER TABLE `***` DISABLE KEYS */;
INSERT ...
INSERT ...
/*!40000 ALTER TABLE `***` ENABLE KEYS */;

Cheers,
Rob.
--
.................................................. ..........
SwarmBuy.com - http://www.swarmbuy.com

Leveraging the buying power of the masses!
.................................................. ..........
  Reply With Quote
4 13th April 18:39
blackwaterdev
External User
 
Posts: 1
Default a better way to do a data import?


I think that's possible, so I'll give it a shot.

For some reason, even with straight inserts my php script is dying around
180,000 rows. Basically, I took out all the compare/update code so now I
grab the row from the db and if there isn't one, do an insert. I've wiped
my db so should do straight inserts so I'm not sure what's taking up the
memory.

private function processFile($table, $key){

$this->openFileForReading(); //foudn in GLFile class
while (!feof($this->fileHandle)) {
$file_data = fgets($this->fileHandle);

$this->insert($table, $key, $file_data);

}
$this->closeFile();

}

private function insert($table, $key, $data){
if (strlen($data)<10) return false;

$data=$this->db->escape_string($data);

//this is the data we will use should we need to do an insert
$insert_data = str_replace("\"", "'", $data);

//this is a hack we need to change the separator of the file
//we need this because we need to put the data into an array and if
//we simply use the comma, then it splits address fields
$data = str_replace("\",\"", "~", $data);
$data = str_replace("\"","",$data); //let's remove the double quotes
$this->setDelimiter("~");
$dataToArray=$this->stringToArray($data);
//set it back for other functions
$this->setDelimiter(",");

//get the id, we trust it is the first column


$key_data=$dataToArray[0];

//does the value exist in the database already?
$sql="select * from prelim_$table where $key='$key_data'";
$handle=$this->db->select($sql);
if ($this->db->row_count($handle)>0){
$textData=array();
$colsToUpdate="";
$dataRow="";
$dataRow=$this->db->fetch_row($handle);
//now that we have the data, let's merge the row from the
//file with the column names

$textData=array_combine($this->carrierColumns,
$dataToArray);
//cast some values that are strings in the text file
$textData['cars1']=(int) $textData['cars1'];
$textData['car_amount']=(int)
$textData['car_amount'];

$textData['total_work']=trim($textData['total_work']);

$textData['business_zip']=trim($textData['business_zip']);
//clean up some old db data
$dataRow['rfc_number']=trim($dataRow['rfc_number']);

$dataRow['business_zip']=trim($dataRow['business_zip']);

$colsToUpdate=array_diff($textData,$dataRow);

//if we have columns to update, do it
if (count($colsToUpdate)>0){

$colset="";
foreach ($colsToUpdate as $column=>$value){
$colset.="$column='$value',";
}
//strip last comma
$colset=substr($colset, 0, -1);
$sql="update prelim_$table set $colset where
$key='$key_data'";
$this->db->write($sql);

}

$dataRow=NULL;
$colsToUpdate=NULL;
$colset=NULL;
$textData=NULL;
}
else{

//insert the row
$sql="insert into prelim_$table values (";
$sql.=trim($insert_data);
$sql.=");";
$this->db->write($sql);
}


}
  Reply With Quote
5 13th April 18:39
eric.butera
External User
 
Posts: 1
Default a better way to do a data import?


Hi!

Another option for working with this came to me after looking at your
code. If you have a new recordset and the old one you must be getting
them from somewhere. Is it possible to just do a diff against the two
sets if they are raw SQL exports? Maybe then you could do
insert/updates based on that.

I have a couple of non-related suggestions that I think might help.
If not then just ignore them.

- "\"" can be written as tick quote tick instead of quote escape quote
quote (' " ') so that you don't have to escape it.

- Your database object could return a result object on queries. If
you make it return a result object then your query result gets shoved
into that. So you'd have a db_pgsql and db_pgsql_result that way you
don't have to pass around that handle everywhere. You just run
$result = $db->query(sql). Then result contains all the methods for
fetching your data.

- If you can run this locally you can use XDebug to figure out what
the problem is with your memory issue. Also you can use the trace
feature to see what is going on under the hood.
  Reply With Quote
6 13th April 22:30
bastien_k
External User
 
Posts: 1
Default a better way to do a data import?


what about uploading the entire file into a [semi]temp table..then doing cross table comparisons to load your main table with the data?
  Reply With Quote
7 13th April 22:30
dmagick
External User
 
Posts: 1
Default a better way to do a data import?


// clone the table.
create temporary table x as select * from othertable limit 1;

in mysql:
load data infile '/path/to/csv' into table 'x';

(http://dev.mysql.com/doc/refman/4.1/en/load-data.html)

in postgres:

\copy x from '/path/to/csv'

(no semi-colon on the end).
****yze; // make sure you do this otherwise the next step is going to
take forever.

step2:
delete stuff from the temp table that already exists:

delete from x where id in (select id from othertable);

or some such variant.


step3: insert left over stuff into real table.

--
Postgresql & php tutorials
http://www.designmagick.com/
  Reply With Quote
8 13th April 22:30
frozendice
External User
 
Posts: 1
Default a better way to do a data import?


Couldn't you just use PHP to rewrite the 200k row text file from just text,
to actual insert commands? Once you have that, most mysql interfaces have
an import ability where you can give it a file which is a list of commands
(insert, update, etc.) and it will run them itself. That should be much
faster.

- Dan
  Reply With Quote
9 13th April 22:51
ceo
External User
 
Posts: 1
Default a better way to do a data import?


If you unset/NULL out *every* variable, then you should not run out of
RAM...

It might be a heck of a lot faster to LOAD DATA INFILE to a temp
table, and then use SQL statements to compare the data and
update/insert any altered/missing data...

I do something similar with about the same number of rows, so it can
be done...

--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/from/lynch
Yeah, I get a buck. So?
  Reply With Quote
Reply


Thread Tools
Display Modes




Copyright © 2006 SmartyDevil.com - Dies Mies Jeschet Boenedoesef Douvema Enitemaus -
666