Manipulate Docx Document With ABAP

5
 Manipulate Docx document with ABAP created by Jerry Wang on May 28, 2014 5:50 AM, last modified by Jerry Wang on May 28, 2014 6:02 AM Version 2   Office OpenXML  Using CL_DOCX_DOCUMENT to read word document   Using CL_DOCX_DOCUMENT to change word document  There is a useful class CL_DOCX_DOCUMENT provided by SAP which could support read and write access to a word document with file extension ".docx". This document gives a brief introduction abou t its usage and could be us ed as a starting point to build your own application whi ch needs to manipulate word document via ABAP. Office OpenXML Starting with Microsoft Office2007, when you create a new word document, you will get a file with ".docx" file extension by default which follows the Office openXML format.  You can find its detailed definition from wiki. For example, I create a very simple word document which contains a header area, a paragraph with three lines as body, and a picture.  According to Office OpenXML protocal, after you change t he file extension from ".docx" to ".zip", its icon changes to an archive file and thus could be opened via winrar. All

description

Manipulate Docx Document With ABAP

Transcript of Manipulate Docx Document With ABAP

Manipulate Docx document with ABAPcreated byJerry Wangon May 28, 2014 5:50 AM, last modified byJerry Wangon May 28, 2014 6:02 AMVersion 2inShare Office OpenXML Using CL_DOCX_DOCUMENT to read word document Using CL_DOCX_DOCUMENT to change word documentThere is a useful class CL_DOCX_DOCUMENT provided by SAP which could support read and write access to a word document with file extension ".docx".This document gives a brief introduction about its usage and could be used as a starting point to build your own application which needs to manipulate word document via ABAP.Office OpenXMLStarting with Microsoft Office2007, when you create a new word document, you will get a file with ".docx" file extension by default which follows the Office openXML format. You can find its detailed definition fromwiki.For example, I create a very simple word document which contains a header area, a paragraph with three lines as body, and a picture.

According to Office OpenXML protocal, after you change the file extension from ".docx" to ".zip", its icon changes to an archive file and thus could be opened via winrar. All information about my sample document are spreaded inside a series of xml files in the archive file ( plus media file like picture, music and video if the word document has such one).The most efficient way to study is create a word document by yourself, change extension to zip and explore it.

Using CL_DOCX_DOCUMENT to read word documentI use the following sample code to explain how to use this class.In order to avoid unnecessary local variable declaration, I use the new feature "inline declaration" available in release 740. If this version is not available for you, just replace them with old manual declaration for local variable.DATA:lo_documentTYPEREFTOcl_docx_document,1. lv_contentTYPExstring.2. PERFORMget_doc_binaryUSING'C:\Users\i042416\Desktop\test.docx'CHANGINGlv_content.3. lo_document=cl_docx_document=>load_document(lv_content).4. CHECKlo_documentISNOTINITIAL.5. DATA(lo_core_part)=lo_document->get_corepropertiespart().6. DATA(lv_core_data)=lo_core_part->get_data().7. DATA(lo_main_part)=lo_document->get_maindocumentpart().8. DATA(lo_image_parts)=lo_main_part->get_imageparts().9. DATA(lv_image_count)=lo_image_parts->get_count().10. DOlv_image_countTIMES.11. DATA(lo_image_part)=lo_image_parts->get_part(sy-index-1).12. DATA(lv_image_data)=lo_image_part->get_data().13. ENDDO.14. DATA(lo_header_parts)=lo_main_part->get_headerparts().15. DATA(lv_header_count)=lo_header_parts->get_count().16. DOlv_header_countTIMES.17. DATA(lo_header_part)=lo_header_parts->get_part(sy-index-1).18. DATA(lv_header_data)=lo_header_part->get_data().19. ENDDO.

Comments:

1. you can get a instance of word document via methodcl_docx_document=>load_document. It is necessary to pass the document binary data with type xstring into this method. I don't list source code of subroutine get_doc_binary as it is not relevant. Just find it from attachment.2. The system administrative data like author, creation and last modification date are stored in so called "Core property part", which could be fetched via document instance got in step1. Once you own the instance of Core property part, you can get its binary data via method get_data().The returned data has xml format( so does all the left other kinds of parts in this document ) so it could be easily parsed via DOM or SAX parser.

3. from document instance we can get main part instance. Its binary data includes all the three body line texts with their font color:

4. The binary data of all pictures embedded in the word document could be retrieved via two steps. Firstly get the image part collection from main part instance and then loop each image part instance from the image collection. The get_part method accepts the index starting from 0. The way to read header block information is exactly the same.Using CL_DOCX_DOCUMENT to change word documentSee the nice documentHow to - Add Custom XML Parts to Microsoft Word using ABAPfrom Leon Limson.You could also achieve the same requirement with the respective class below.