Help language development. Donate to The Perl Foundation

PDF::Class zef:dwarring last updated on 2022-07-04

f0c4d66498e6375f4ed71a0a0b612dd93230f42a/

[Raku PDF Project] / PDF::Class

PDF::Class

This Raku module is the base class for PDF::API6.

PDF::Class provides a set of roles and classes that map to the internal structure of PDF documents; the aim being to make it easier to read, write valid PDF files.

It assists with the construction of PDF documents, providing type-checking and the sometimes finicky serialization rules regarding objects.

Description

The entry point of a PDF document is the trailer dictionary. This is mapped to PDF::Class. It contains a Root entry which is mapped to a PDF::Catalog objects, and may contain other entries, including an Info entry mapped to a PDF::Info object.

use PDF::Class;
use PDF::Catalog;
use PDF::Page;
use PDF::Info;

my PDF::Class $pdf .= open: "t/helloworld.pdf";

# vivify Info entry; set title
given $pdf.Info //= {} -> PDF::Info $_ {
    .Title = 'Hello World!';
    .ModDate = DateTime.now; # PDF::Class sets this anyway...
}

# modify Viewer Preferences
my PDF::Catalog $catalog = $pdf.Root;
given $catalog.ViewerPreferences //= {} {
    .HideToolbar = True;
}

# add a page ...
my PDF::Page $new-page = $pdf.add-page;
$new-page.gfx.say: "New last page!";

# save the updated pdf
$pdf.save-as: "tmp/pdf-updated.pdf";

This module is a work in progress. It currently defines roles and classes for many of the more commonly occurring PDF objects as described in the PDF 32000-1:2008 1.7 specification.

More examples:

Set Marked Info options

use PDF::Class;
use PDF::Catalog;
use PDF::MarkInfo;
my PDF::Class $pdf .= new;
my PDF::Catalog $catalog = $pdf.catalog; # same as $pdf.Root;
with $catalog.MarkInfo //= {} -> PDF::MarkInfo $_ {
    .Marked = True;
    .UserProperties = False;
    .Suspects = False;
}

Set Page Layout & Viewer Preferences

use PDF::Class;
use PDF::Catalog;
use PDF::Viewer::Preferences;

my PDF::Class $pdf .= new;

my PDF::Catalog $doc = $pdf.catalog;
$doc.PageLayout = 'TwoColumnLeft';
$doc.PageMode   = 'UseThumbs';

given $doc.ViewerPreferences //= {} -> PDF::Viewer::Preferences $_ {
    .Duplex = 'DuplexFlipShortEdge';
    .NonFullScreenPageMode = 'UseOutlines';
}
# ...etc, see PDF::ViewerPreferences

List AcroForm Fields

use PDF::Class;
use PDF::AcroForm;
use PDF::Field;

my PDF::Class $doc .= open: "t/pdf/samples/OoPdfFormExample.pdf";
with my PDF::AcroForm $acroform = $doc.catalog.AcroForm {
    my PDF::Field @fields = $acroform.fields;
    # display field names and values
    for @fields -> $field {
        say "{$field.key}: {$field.value}";
    }
}

Gradual Typing

In theory, we should always be able to use PDF::Class accessors for structured access and updating of PDF objects.

In reality, a fair percentage of PDF files contain at least some conformance issues (as reported by pdf-checker.raku) and PDF::Class itself is under development.

For these reasons it possible to bypass PDF::Class accessors; instead accessing hashes and arrays directly, giving raw access to the PDF data.

This will also bypass type coercements, so you may need to be more explicit. In the following example forces the setting of PageMode to an illegal value.

use PDF::Class;
use PDF::Catalog;
use PDF::COS::Name;
my PDF::Class $pdf .= new;

my PDF::Catalog $doc = $pdf.catalog;
try {
    $doc.PageMode = 'UseToes'; # illegal
    CATCH { default { say "err, that didn't work: $_" } }
}

# same again, bypassing type checking
$doc<PageMode> = PDF::COS::Name.COERCE: 'UseToes';

Scripts in this Distribution

pdf-append.raku --save-as=output.pdf in1.pdf in2.pdf ...

appends PDF files.

pdf-burst.raku --save-as=basename-%03d.pdf --password=pass in.pdf

bursts a multi-page PDF into single page PDF files

pdf-checker.raku --trace --render --strict --exclude=Entry1,Entry2 --repair input-pdf

This is a low-level tool for PDF authors and users. It traverses a PDF, checking it's internal structure against PDF:Class definitions as derived from the PDF 32000-1:2008 1.7 specification.

Example 1: Dump a simple PDF

% pdf-checker.raku --trace t/helloworld.pdf
xref:   << /ID ... /Info 1 0 R /Root 2 0 R >>   % PDF::Class
  /ID:  [ "×C¨\x[86]üÜø\{iÃeH!\x[9E]©A" "×C¨\x[86]üÜø\{iÃeH!\x[9E]©A" ] % PDF::COS::Array[Str]
  /Info:        << /Author "t/helloworld.t" /CreationDate (D:20151225000000Z00'00') /Creator "PDF::Class" /Producer "Raku PDF::Class 0.2.5" >>        % PDF::COS::Dict+{PDF::Info}
  /Root:        << /Type /Catalog /Pages 3 0 R >>       % PDF::Catalog
    /Pages:     << /Type /Pages /Count 1 /Kids ... /Resources ... >>    % PDF::Pages
      /Kids:    [ 4 0 R ]       % PDF::COS::Array[PDF::Content::PageNode]
        [0]:    << /Type /Page /Contents 5 0 R /MediaBox ... /Parent 3 0 R >>   % PDF::Page
          /Contents:    << /Length 1944 >>      % PDF::COS::Stream
          /MediaBox:    [ 0 0 595 842 ] % PDF::COS::Array[Numeric]
      /Resources:       << /ExtGState ... /Font ... /ProcSet ... /XObject ... >>        % PDF::COS::Dict+{PDF::Resources}
        /ExtGState:     << /GS1 6 0 R >>        % PDF::COS::Dict[Hash]
          /GS1: << /Type /ExtGState /ca 0.5 >>  % PDF::COS::Dict+{PDF::ExtGState}
        /Font:  << /F1 7 0 R /F2 8 0 R /F3 9 0 R >>     % PDF::COS::Dict[PDF::Resources::Font]
          /F1:  << /Type /Font /Subtype /Type1 /BaseFont /Helvetica-Bold /Encoding /WinAnsiEncoding >>  % PDF::Font::Type1
          /F2:  << /Type /Font /Subtype /Type1 /BaseFont /Helvetica /Encoding /WinAnsiEncoding >>       % PDF::Font::Type1
          /F3:  << /Type /Font /Subtype /Type1 /BaseFont /ZapfDingbats >>       % PDF::Font::Type1
        /ProcSet:       [ /PDF /Text ]  % PDF::COS::Array[PDF::COS::Name]
        /XObject:       << /Im1 10 0 R /Im2 11 0 R >>   % PDF::COS::Dict[PDF::Resources::XObject]
          /Im1: << /Type /XObject /Subtype /Image /BitsPerComponent 8 /ColorSpace /DeviceRGB /Filter /DCTDecode /Height 254 /Width 200 /Length 8247 >>  % PDF::XObject::Image
          /Im2: << /Type /XObject /Subtype /Image /BitsPerComponent 8 /ColorSpace ... /Height 42 /Width 37 /Length 1554 >>      % PDF::XObject::Image
            /ColorSpace:        [ /Indexed /DeviceRGB 255 12 0 R ]      % PDF::ColorSpace::Indexed
              [3]:      << /Length 768 >>       % PDF::COS::Stream
Checking of t/helloworld.pdf completed with 0 warnings and 0 errors

This example dumps a PDF and shows how PDF::Class has interpreted it.

The PDF contains has one page (PDF::Page) that references various other objects, such as fonts and xobject images.

Example 2: Check a sample PDF

% wget http://www.stillhq.com/pdfdb/000025/data.pdf
% pdf-checker.raku --strict --render data.pdf
Warning: Error processing indirect object 27 0 R at byte offset 976986:
Ignoring 1 bytes before 'endstream' marker
Rendering warning(s) in 28 0 R (PDF::Page):
-- unexpected operation 'w' (SetLineWidth) used in Path context, following 'm' (MoveTo)
-- unexpected operation 'w' (SetLineWidth) used in Path context, following 'm' (MoveTo)
Rendering warning(s) in 30 0 R  (PDF::XObject::Form):
-- unexpected operation 'w' (SetLineWidth) used in Path context, following 'm' (MoveTo)
Unknown entries 1 0 R (PDF::Catalog) struct: /ViewPreferences(?ViewerPreferences)
Checking of /home/david/Documents/test-pdf/000025.pdf completed with 5 warnings and 0 errors

In this example:

Notes

pdf-content-dump.raku --raku in.pdf

Displays the content streams for PDF pages, commented, and in a human-readable format:

% pdf-content-dump.raku t/example.pdf 
% **** Page 1 ****
BT % BeginText
  1 0 0 1 100 150 Tm % SetTextMatrix
  /F1 16 Tf % SetFont
  17.6 TL % SetTextLeading
  [ (Hello, world!) ] TJ % ShowSpaceText
  T* % TextNextLine
ET % EndText

The --raku option dumps using a Raku-like notation:

pdf-content-dump.raku --perl t/example.pdf 
# **** Page 1 ****
.BeginText();
  .SetTextMatrix(1, 0, 0, 1, 100, 150);
  .SetFont("F1", 16);
  .SetTextLeading(17.6);
  .ShowSpaceText($["Hello, world!"]);
  .TextNextLine();
.EndText();

pdf-info.raku in.pdf

Prints various PDF properties. For example:

% pdf-info.raku ~/Documents/test-pdfs/stillhq.com/000056.pdf 
File:         /home/david/Documents/test-pdfs/stillhq.com/000056.pdf
File Size:    63175 bytes
Pages:        2
Outlines:     no
Author:       Prince Restaurant
CreationDate: Wed Oct 03 23:41:01 2001
Creator:      FrameMaker+SGML 6.0
Keywords:     Pizza, Pasta, Antipasto, Lasagna, Food
ModDate:      Thu Oct 04 00:03:04 2001
Producer:     Acrobat PDFWriter 4.05  for Power Macintosh
Subject:      Take Out & Catering Menu
Title:        Prince Pizzeria & Bar
Tagged:       no
Page Size:    variable
PDF version:  1.3
Revisions:    1
Encryption:   no

pdf-fields.raku --password=pass --page=n --save-as=out.pdf [options] in.pdf

Modes
General Options:

List, reformat or set PDF form fields.

pdf-revert.raku --password=pass --save-as=out.pdf in.pdf

undoes the last revision of an incrementally saved PDF file.

pdf-toc.raku --password=pass --/title --/labels in.pdf

prints a table of contents, showing titles and page-numbers, using PDF outlines.

% wget http://www.stillhq.com/pdfdb/000432/data.pdf
% pdf-toc.raku data.pdf
Linux Kernel Modules Installation HOWTO
  Table of Contents . . . i
  1. Purpose of this Document . . . 1
  2. Pre-requisites . . . 2
  3. Compiler Speed-up . . . 3
  4. Recompiling the Kernel for Modules . . . 4
    5.1. Configuring Debian or RedHat for Modules . . . 5
    5.2. Configuring Slackware for Modules . . . 5
    5.3. Configuring Other Distributions for Modules . . . 6

Note that outlines are an optional PDF feature. pdf-info.raku can be used to check if a PDF has them:

% pdf-info.raku my-doc.pdf | grep Outlines:

Development Status

The PDF::Class module is under construction and not yet functionally complete.

Note: The roles and classes in this module are primarly based on roles generated by the PDF::ISO_32000 module. The PDF::class module currently implements around 100 roles and classes of the 350+ objects extracted by PDF::ISO_32000.

See also

Classes Quick Reference

Class Types Accessors Methods Description ISO-32000 References
PDF::Signature dict ByteRange, Cert, Changes, ContactInfo, Contents, Location, M(date-signed), Name, Prop_AuthTime, Prop_AuthType, Prop_Build, Reason, Reference, SubFilter, Type, V Table_252-Entries_in_a_signature_dictionary
PDF::StructElem dict A(attributes), ActualText, Alt(alternative-description), C, E(expanded-form), ID, K(kids), Lang, P(parent), Pg(page), R(revision), S(tag), T(title), Type Table_323-Entries_in_a_structure_element_dictionary
PDF::StructTreeRoot dict ClassMap, IDTree, K(kids), ParentTree, ParentTreeNextKey, RoleMap, Type Table_322-Entries_in_the_structure_tree_root
PDF::ViewerPreferences dict CenterWindow, Direction, DisplayDocTitle, Duplex, FitWindow, HideMenubar, HideToolbar, HideWindowUI, NonFullScreenPageMode, NumCopies, PickTrayByPDFSize, PrintArea, PrintPageRange, PrintScaling, ViewArea, ViewClip Table_150-Entries_in_a_viewer_preferences_dictionary
PDF::XObject::Form stream BBox, FormType, Group, LastModified, Matrix, Metadata, Name, OC(optional-content-group), OPI, PieceInfo, Ref, Resources, StructParent, StructParents, Subtype, Type canvas, contents, contents-parse, core-font, find-resource, finish, gfx, graphics, has-pre-gfx, height, images, new-gfx, pre-gfx, pre-graphics, render, resource-entry, resource-key, save-as-image, text, tiling-pattern, use-font, use-resource, width, xobject-form XObject Forms - /Type /XObject /Subtype Form See [PDF Spec 1.7 4.9 Form XObjects] Table_95-Additional_Entries_Specific_to_a_Type_1_Form_Dictionary
PDF::XObject::Image stream Alternates, BitsPerComponent, ColorSpace, Decode, Height, ID, ImageMask, Intent, Interpolate, Mask, Metadata, Name, OC(optional-content), OPI, SMask, SMaskInData, StructParent, Subtype, Type, Width height, image-obj, inline-content, inline-to-xobject, to-png, width XObjects /Type XObject /Subtype /Image See [PDF 32000 Section 8.9 - Images ] Table_89-Additional_Entries_Specific_to_an_Image_Dictionary
PDF::XObject::PS stream Level1, Subtype, Type Postscript XObjects /Type XObject /Subtype PS See [PDF 32000 Section 8.8.2 PostScript XObjects] Table_88-Additional_Entries_Specific_to_a_PostScript_XObject_Dictionary

(generated by etc/make-quick-ref.pl)