I recently asked a question regarding reading a PDF with PHP. I've tried
Zend_pdf, but all this is able to give me is the number of pages in a
PDF, and cannot extract the text from the PDF files I have. I thought
I'd try a different method, and try to extract the text straight from
the Word do***ent which is used to generate the PDF. Does anyone have
any experience with this sort of thing, or enough to suggest a library
which is capable of this?
Unfortunately all of those use COM, which is only available on
Windows... I'm guessing this isn't possible on a proper OS?
Ash
http://www.ashleysheridan.co.uk
ash
4
6th May 13:21
phpster
External User
Posts: 1
Reading a Word do***ent from PHP
Sadly, no...the closest you could come to, would be to try and see if you
can manipulate OpenOffice via exec or something like that to read the
do***ent and do what you need to
--
Bastien
Cat, the other other white meat
phpster
5
6th May 13:21
tmboyd1
External User
Posts: 1
Reading a Word do***ent from PHP
There's a Python script described in this article:
....that sounds like it will do what you want. ****s that it's using
Python, but at least it's a technology that isn't hard to put on a Linux
machine. Other than that, I would recommend Mono, perhaps, and use a
..NET DLL (or similar construct).
HTH,
Todd Boyd
Web Programmer
tmboyd1
6
6th May 13:21
eric.butera
External User
Posts: 1
Reading a Word do***ent from PHP
Ah that's too bad. Sorry about the bad tip!
eric.butera
7
6th May 13:21
clive_lists
External User
Posts: 1
Reading a Word do***ent from PHP
Hi I know the new microsoft docx format is an xml do***ent, so you could
probably use the xml parser with that.
Any chance you can get them to use a rtf file instead of a word file to
convert to pdf, rtf is mostly readable text with some control words
thrown in for formatting.
clive
clive_lists
8
6th May 13:21
ash
External User
Posts: 1
Reading a Word do***ent from PHP
No worries about the tip, it was a good tip.
Unfortunately I'm stuck trying to extract text from a Word do***ent or a
PDF file because she doesn't know how to make a CSV in Excel, despite me
showing her how to do it. She kept trying to upload a PDF to the site
and wondered why it wasn't able to pick out the text from that!
Obviously an attempt was made to shift the blame to me for not building
the site right!
Ash
http://www.ashleysheridan.co.uk
ash
9
6th May 13:22
robert
External User
Posts: 1
Reading a Word do***ent from PHP
If it were me... and I REALLY needed to do this... I'd probably start
look into how to make Open Office do a command line conversion of word
do***ents to plain text.
Cheers,
Rob.
--
http://www.interjinn.com
Application and Templating Framework for PHP
robert
10
6th May 13:22
mickael+php
External User
Posts: 1
Reading a Word do***ent from PHP
Robert ***mings a écrit :
If you're on Debian, seek for the openoffice.org-headless package