Mombu the Php Forum sponsored links

Go Back   Mombu the Php Forum > Php > Reading a Word do***ent from PHP
User Name
Password
REGISTER NOW! Mark Forums Read

sponsored links


Reply
 
1 6th May 13:20
ash
External User
 
Posts: 1
Default Reading a Word do***ent from PHP



Hi All,

I recently asked a question regarding reading a PDF with PHP. I've tried
Zend_pdf, but all this is able to give me is the number of pages in a
PDF, and cannot extract the text from the PDF files I have. I thought
I'd try a different method, and try to extract the text straight from
the Word do***ent which is used to generate the PDF. Does anyone have
any experience with this sort of thing, or enough to suggest a library
which is capable of this?


Ash
http://www.ashleysheridan.co.uk
  Reply With Quote


  sponsored links


2 6th May 13:20
phpster
External User
 
Posts: 1
Default Reading a Word do***ent from PHP



This may help

http://drewd.com/2007/01/25/reading-from-a-word-do***ent-with-com-in-php

http://www.phpclasses.org/browse/package/388.html

http://www.developertutorials.com/blog/php/extracting-text-from-word-do***ents-via-php-and-com-81/


--

Bastien

Cat, the other other white meat
  Reply With Quote
3 6th May 13:20
ash
External User
 
Posts: 1
Default Reading a Word do***ent from PHP


Unfortunately all of those use COM, which is only available on
Windows... I'm guessing this isn't possible on a proper OS?


Ash
http://www.ashleysheridan.co.uk
  Reply With Quote
4 6th May 13:21
phpster
External User
 
Posts: 1
Default Reading a Word do***ent from PHP


Sadly, no...the closest you could come to, would be to try and see if you
can manipulate OpenOffice via exec or something like that to read the
do***ent and do what you need to

--

Bastien

Cat, the other other white meat
  Reply With Quote
5 6th May 13:21
tmboyd1
External User
 
Posts: 1
Default Reading a Word do***ent from PHP


There's a Python script described in this article:

http://www.gsdesign.ro/blog/php-convert-microsoft-word-doc-to-pdf/

....that sounds like it will do what you want. ****s that it's using
Python, but at least it's a technology that isn't hard to put on a Linux
machine. Other than that, I would recommend Mono, perhaps, and use a
..NET DLL (or similar construct).

HTH,


Todd Boyd
Web Programmer
  Reply With Quote
6 6th May 13:21
eric.butera
External User
 
Posts: 1
Default Reading a Word do***ent from PHP


Ah that's too bad. Sorry about the bad tip!
  Reply With Quote
7 6th May 13:21
clive_lists
External User
 
Posts: 1
Default Reading a Word do***ent from PHP


Hi I know the new microsoft docx format is an xml do***ent, so you could
probably use the xml parser with that.

Any chance you can get them to use a rtf file instead of a word file to
convert to pdf, rtf is mostly readable text with some control words
thrown in for formatting.

clive
  Reply With Quote
8 6th May 13:21
ash
External User
 
Posts: 1
Default Reading a Word do***ent from PHP


No worries about the tip, it was a good tip.

Unfortunately I'm stuck trying to extract text from a Word do***ent or a
PDF file because she doesn't know how to make a CSV in Excel, despite me
showing her how to do it. She kept trying to upload a PDF to the site
and wondered why it wasn't able to pick out the text from that!
Obviously an attempt was made to shift the blame to me for not building
the site right!


Ash
http://www.ashleysheridan.co.uk
  Reply With Quote
9 6th May 13:22
robert
External User
 
Posts: 1
Default Reading a Word do***ent from PHP


If it were me... and I REALLY needed to do this... I'd probably start
look into how to make Open Office do a command line conversion of word
do***ents to plain text.

Cheers,
Rob.
--
http://www.interjinn.com
Application and Templating Framework for PHP
  Reply With Quote
10 6th May 13:22
mickael+php
External User
 
Posts: 1
Default Reading a Word do***ent from PHP


Robert ***mings a écrit :

If you're on Debian, seek for the openoffice.org-headless package

--
Mickaël Wolff aka Lupus Michaelis
http://lupusmic.org
  Reply With Quote
Reply


Thread Tools
Display Modes




Copyright © 2006 SmartyDevil.com - Dies Mies Jeschet Boenedoesef Douvema Enitemaus -
666