Mombu the Programming Forum sponsored links

Go Back   Mombu the Programming Forum > Programming > How to generate links within PDF files (yes, again!)
User Name
Password
REGISTER NOW! Mark Forums Read

sponsored links


Reply
 
1 15th February 01:41
yves-alain.nicollet
External User
 
Posts: 1
Default How to generate links within PDF files (yes, again!)



I have installed ghostscript 8.0 and yep! it works.

I have finished my script, and as promised, I post it below.

I just wanted to add that you should never be discouraged when someone
says: "It's not possible." because when you try as hard as possible,
you (almost) always end up finding a solution.

My script might still have some bugs; but I have tested it against
several Postscript files, some of them with a size greater than 150
MB, and I'm quite happy with it.

I think it should run on any flavour of Un*x, including any
distribution of L1nux (sorry for Wind0ws user; someone should have to
translate it into another scripting language).

Thanks to all those who gave me their advice.

Yves

========================= CUT HERE =========================
#!/bin/ksh
################################################## ####################
# #
# name: #
# mkpdflinks #
# #
# author: Yves-Alain.Nicollet (07/25/2003) #
# #
# purpose: #
# insert pdfmarks into a Postscript file to create PDF links #
# #
# assumptions: #
# - the Postscript file is of level 2 mini; at least contains #
# the comments: %%Page: (x-y) nn #
# - a page starts processing with the macro: bop defined as #
# follows: #
# /pgstate 0 def/DPIx 0 def/DPIy 0 def #
# /bdf {bind def} bind def #
# /bop {/pgstate save def} bdf #
# - the title of the table of contents contains the word Contents #
# - the front matter is numbered in lower case roman #
# (i, ii, iii, iv,...) #
# - the table of contents is inside the fromt matter #
# - the body is numbered according to sections: #
# 1-1, 1-2, 2-1, ... chapters #
# A-1, A-2, B-1, B-2, ... appendices #
# X-1, X-2, ... index #
# - the page numbering is printed at less than 60 points from the #
# page bottom #
# - the page references are introduced by the word "page" like in #
# see details on page 4-12 #
# - the index references are at a line end in the index; they may #
# be several, separated by , or ; or . #
# #
# principle: #
# using the output of pstotext -bboxes, create a shell commands #
# file to insert after the relevant bop in the Postscript file #
# lines containing pdfmarks like: #
# [ /Rect [508 741 525 754] /Page 59 /Border [0 0 1] /Color [0 0 1] #
# /Subtype /Link /ANN pdfmark #
# #
# requires: #
# pstotext, ps2pdf, Ghostscript 8.0 and higher #
# #
# disclaimer: #
# this script is provided as is with no warranty; it can be used, #
# modified, redistributed, for any purpose; #
# any enhancement appreciated (please email me a copy :-) #
# #
################################################## ####################

trap 'rm -f /tmp/x.[spbc][hgbo].$PID nw_$INPUT' 0 1 2 3 9 15

## file management
case $1 in
*.ps|*_ps) INPUT_DIR=`dirname $1`
INPUT=`basename $1`
OUTPUT=`echo $INPUT|sed 's/.ps$/.pdf/'`
cd $INPUT_DIR
export INPUT_DIR INPUT OUTPUT
;;
*) echo usage: `basename $0` inputfile[._]ps && exit 1 ;;
esac

PID=$$
export PID GS_LIB=/usr/local/share/ghostscript/8.00; export GS_LIB
echo 'Getting do***ent properties... \c' >/dev/tty

## map page number (logical, sequential, line number)
awk '/%%Page:/,/^bop$/ {
if ($0 ~ /%%Page:/) printf $0
if ($0 == "bop") print " " NR
}' $INPUT >/tmp/x.pg.$PID

## get bounding boxes
cat $INPUT |
pstotext -bboxes |
# change ^í to - (^í = hyphen, looks like - but is not)
tr '\255' '-' > /tmp/x.bb.$PID

## map do***ent parts
TOC=`awk 'NR==1,$NF=="Contents" {print NR,$0}' /tmp/x.bb.$PID|
awk '$NF ~ /^[ivx][ivx]*$/ && $5 < 60'|tail -1|awk '{print $1}'`
export TOC
BODY=`awk '$NF == "1-1" && $4 < 60 {print NR}' /tmp/x.bb.$PID`
export BODY
INDEX=`awk '$NF == "X-1" && $4 < 60 {print NR}' /tmp/x.bb.$PID`
export INDEX

## compute and set pdfmarks
echo '\nLooking for links... \c' >/dev/tty

cat /tmp/x.bb.$PID |
awk '
function PdfMark(Coordinates,PageNum) {

##### WARNING #####
# make sure the folowing "system" command is on a single line
if (system ("grep \047Page: \(" PageNum "\)\047 /tmp/x.pg." PID " >/dev/null")
== 0) {
printf Coordinates " "
##### WARNING #####
# make sure the folowing "system" command is on a single line
system ("grep \047Page: \(" PageNum "\)\047 /tmp/x.pg." PID "| awk \047\{pri
ntf \$3\}\047")
printf " /Border [0 0 1] /Color [0 0 1] /Subtype /Link /ANN"
print " pdfmark"
}
}

BEGIN {
PID=ENVIRON["PID"]
TOC=ENVIRON["TOC"]
BODY=ENVIRON["BODY"]
INDEX=ENVIRON["INDEX"]
DONE=""
}

{
# where are we in the do***ent
if (NR >= TOC && NR < BODY) { IN_TOC="true"; IN_BODY="false"; IN_INDEX="false"}
else if (NR >= BODY && NR < INDEX) { IN_BODY="true"; IN_TOC="false"; IN_INDEX="false"}
else if (NR >= INDEX) {
IN_INDEX="true"; IN_TOC="false"; IN_BODY="false"}

# is this a page number or a reference
if ($4 <= 60 && ((IN_TOC=="true" && $NF ~ /^[ivx][ivx]*$/) ||
(IN_BODY=="true" && $NF ~ /^[0-9A-W][0-9A-W]*-[0-9][0-9]*$/) ||
(IN_INDEX=="true" && $NF ~ /^X-[0-9][0-9]*/))) PAGE="true"
else PAGE="false"

if (PAGE == "true") {
PG=$NF
# PG: page num where the link is to be set (no link within TOC)
if (DONE !~ / PG /) {
DONE=DONE " " PG " " print "\n"
print "\n" > "/dev/tty"
##### WARNING #####
# make sure the folowing "system" command is on a single line
system ("grep \047Page: \(" PG "\)\047 /tmp/x.pg." PID "| tee /dev/tty|awk
\047\{print \$NF\}\047")
} }
if (IN_BODY=="true" && $4 > 60 &&
($0 ~ /page [0-9A-Z][0-9A-Z]*-[0-9][0-9]*/ ||
$0 ~ /page [ivx][ivx]*/)) {
# internal link (.* page num .* [bug in pstotext?])
gsub (/ /," ")
$1=$1-2
$3=$3+2
co="[ /Rect [" $1 " " $2 " " $3 " " $4 "] /Page"
sub (/.*page /,"")
sub (/[\.,; ].*/,"")
sub (/\.$/,"",$1)
if ($1 ~ /^[0-9A-Z][0-9A-Z]*-[0-9][0-9]*$/ ||
$1 ~ /^[ivx][ivx]*$/) PdfMark(co,$1)
if ($1 ~ /^[0-9A-Z][0-9A-Z]*-[0-9][0-9]*$/ ||
$1 ~ /^[ivx][ivx]*$/) printf $1 " " > "/dev/tty" }
else if (IN_BODY=="true" && $4 > 60 && $NF == "page") { getline
if ($4 > 60 && ($NF ~ /^[0-9A-Z][0-9A-Z]*-[0-9][0-9]*[\.,;]*$/ ||
$NF ~ /^[ivx][ivx]*[\.,;]*$/)) {
# internal link (page num)
gsub (/ /," ")
$1=$1-2
$3=$3+2
co="[ /Rect [" $1 " " $2 " " $3 " " $4 "] /Page"
sub (/[\.,;]$/,"") PdfMark(co,$NF)
printf $NF " " > "/dev/tty"
} }
else if ($4 > 60 && (IN_TOC=="true" || IN_INDEX=="true") &&
($NF ~ /^[0-9A-Z][0-9A-Z]*-[0-9][0-9]*[\.,;]*$/ ||
$NF ~ /^[ivx][ivx]*[\.,;]*$/)) {
# table of contents or index (CH-PG)
gsub (/ /," ")
$1=$1-2
$3=$3+2
co="[ /Rect [" $1 " " $2 " " $3 " " $4 "] /Page"
sub (/[\.,;]$/,"") PdfMark(co,$NF)
printf $NF " " > "/dev/tty"
}
}
END {print "\n\n\n\n"}' |
## separate commands
sed '/pdfmark/{
:loop
N
/\n\n/b end
b loop
:end
s/\n\n/\
\
/
}' |
## get rid of extra commands (keep only pdfmarks inserts,
## ie. paragraphs containing the word pdfmark)
awk '{
if ($0 ~ /^[0-9]/) {
S=$0
L=""
getline
if ($0 !~ /pdfmark/) next
else {
while ($0 ~ /pdfmark/) {
L=L "\n" $0
getline
}
print "\n" S L "\n" } }
}' >/tmp/x.co.$PID
echo '\nInserting links... \c' >/dev/tty

## make shell script
EOF=`wc -l $INPUT |awk '{print $1}'` export EOF
echo 'awk \047NR==1,NR==\c' >/tmp/x.sh.$PID
cat /tmp/x.co.$PID|awk '
BEGIN {
EOF=ENVIRON["EOF"]
INPUT=ENVIRON["INPUT"]
OUTPUT=ENVIRON["OUTPUT"]
}
{
if ($0 ~ /^[0-9]/) {
if (L) printf "awk \047NR==" L ",NR=="
print $0 "\047 " INPUT " >> nw_" INPUT L=$0+1 }
else if ($0 ~ /pdfmark/) print "echo \047" $0 "\047 >> nw_" INPUT } END {
print "awk \047NR==" L ",NR==" EOF "\047 " INPUT " >> nw_" INPUT
}' >>/tmp/x.sh.$PID

## apply
rm -f nw_$INPUT
sh /tmp/x.sh.$PID

## convert to pdf

echo '\nMaking PDF... \c' >/dev/tty

/usr/local/bin/ps2pdf \
-dDEVICEWIDTHPOINTS=595 -dDEVICEHEIGHTPOINTS=842 \
-sDEVICE=pdfwrite -dFIXEDMEDIA \
nw_$INPUT $OUTPUT

## list result
echo >/dev/tty
ls -l $OUTPUT

## end
  Reply With Quote


  sponsored links


2 15th February 01:41
raoul
External User
 
Posts: 1
Default How to generate links within PDF files (yes, again!)



Hello,
can your script create pdfmarks for hyperlinks too? eg
http://www.somewhere.com ? or it can create pdfmarks only for TOC or Index?
I ve been searching for a script/program for a long time to do this job, but
now it seems that I am getting somewhere! I am a windows user and I only
know VB programming :-(.
So, is there a windows programmer that has the kindness and the time to
translate Yves-Alain's script?

U R a Superstar :-)

--
Raoul
-ghostscript enthusiast-
  Reply With Quote
3 15th February 01:41
yves-alain.nicollet
External User
 
Posts: 1
Default How to generate links within PDF files (yes, again!)


No, I'm sorry, the purpose of my script was only to create links
within a book, ie. TOC, index and internal page cross-references. When
I need to create PDFs with URL-type links, I first create HTML files
and then filter to PDF using HTMLDOC (http://www.easysw.com/htmldoc/),
a wonderful freeware designed for that.
Yves
  Reply With Quote
Reply


Thread Tools
Display Modes




Copyright © 2006 SmartyDevil.com - Dies Mies Jeschet Boenedoesef Douvema Enitemaus -
666