Convert DOC to PDF using home made java code
We need portability to work with our documents anywhere portability needs have increased from day to day. PDF (Portable Document Format ) can be accessed on any budget android smartphone without breaking sweat.We have written java code.
There are many programs that can convert data but not like home made code if you documents are confidential then it is better to use this home made to code to convert the files into the PDF format.
This program allows you to convert one or many microsoft word (2007+) files to PDF files. The program extracts text, images and information about font colors, sizes and styles used in the word files. Then these things are placed in PDF files when they are generated.
The main APIs used in this program are Apache POI and iText. Apache API is used to extract information from a microsoft word file while iText is used to create a PDF file.
This is original Microsoft word file:
This is Generated pdf file from the original microsoft word file:
import java.io.File; import java.io.FileInputStream; import java.io.FileOutputStream; import java.util.Iterator; import java.util.List; import javax.swing.JFileChooser; import javax.swing.filechooser.FileNameExtensionFilter; import org.apache.poi.xwpf.usermodel.XWPFDocument; import org.apache.poi.xwpf.usermodel.XWPFParagraph; import org.apache.poi.xwpf.usermodel.XWPFPicture; import org.apache.poi.xwpf.usermodel.XWPFPictureData; import org.apache.poi.xwpf.usermodel.XWPFRun; import com.itextpdf.text.Chunk; import com.itextpdf.text.BaseColor; import com.itextpdf.text.Document; import com.itextpdf.text.Font; import com.itextpdf.text.FontFactory; import com.itextpdf.text.Image; import com.itextpdf.text.pdf.PdfWriter; import com.itextpdf.text.PageSize; public class WordToPdfConverter{ public static void main(String[] args){ selectFiles(); } public static void selectFiles(){ JFileChooser chooser = new JFileChooser(); FileNameExtensionFilter filter = new FileNameExtensionFilter("Microsoft Word 2007+", "docx"); chooser.setFileFilter(filter); chooser.setMultiSelectionEnabled(true); int returnVal = chooser.showOpenDialog(null); if(returnVal == JFileChooser.APPROVE_OPTION) { File[] Files=chooser.getSelectedFiles(); System.out.println("Please wait..."); for( int i=0;i<Files.length;i++){ String wordfile=Files[i].toString(); convertWordToPdf(wordfile,wordfile.substring(0,wordfile.indexOf('.'))+".pdf"); } System.out.println("Conversion complete"); } } public static void convertWordToPdf(String src, String desc){ try{ //create file inputstream object to read data from file FileInputStream fs=new FileInputStream(src); //create document object to wrap the file inputstream object XWPFDocument doc=new XWPFDocument(fs); //72 units=1 inch Document pdfdoc=new Document(PageSize.A4,72,72,72,72); //create a pdf writer object to write text to mypdf.pdf file PdfWriter pwriter=PdfWriter.getInstance(pdfdoc, new FileOutputStream(desc)); //specify the vertical space between the lines of text pwriter.setInitialLeading(20); //get all paragraphs from word docx List plist=doc.getParagraphs(); //open pdf document for writing pdfdoc.open(); for (int i = 0; i < plist.size(); i++) { //read through the list of paragraphs XWPFParagraph pa = plist.get(i); //get all run objects from each paragraph List runs = pa.getRuns(); //read through the run objects for (int j = 0; j < runs.size(); j++) { XWPFRun run=runs.get(j); //get pictures from the run and add them to the pdf document List piclist=run.getEmbeddedPictures(); //traverse through the list and write each image to a file Iterator iterator=piclist.iterator(); while(iterator.hasNext()){ XWPFPicture pic=iterator.next(); XWPFPictureData picdata=pic.getPictureData(); byte[] bytepic=picdata.getData(); Image imag=Image.getInstance(bytepic); pdfdoc.add(imag); } //get color code int color=getCode(run.getColor()); //construct font object Font f=null; if(run.isBold() && run.isItalic()) f=FontFactory.getFont(FontFactory.TIMES_ROMAN,run.getFontSize(),Font.BOLDITALIC, new BaseColor(color)); else if(run.isBold()) f=FontFactory.getFont(FontFactory.TIMES_ROMAN,run.getFontSize(),Font.BOLD, new BaseColor(color)); else if(run.isItalic()) f=FontFactory.getFont(FontFactory.TIMES_ROMAN,run.getFontSize(),Font.ITALIC, new BaseColor(color)); else if(run.isStrike()) f=FontFactory.getFont(FontFactory.TIMES_ROMAN,run.getFontSize(),Font.STRIKETHRU, new BaseColor(color)); else f=FontFactory.getFont(FontFactory.TIMES_ROMAN,run.getFontSize(),Font.NORMAL, new BaseColor(color)); //construct unicode string String text=run.getText(-1); byte[] bs; if (text!=null){ bs=text.getBytes(); String str=new String(bs,"UTF-8"); //add string to the pdf document Chunk chObj1=new Chunk(str,f); pdfdoc.add(chObj1); } } //output new line pdfdoc.add(new Chunk(Chunk.NEWLINE)); } //close pdf document pdfdoc.close(); }catch(Exception e){e.printStackTrace();} } public static int getCode(String code){ int colorCode; if(code!=null) colorCode=Long.decode("0x"+code).intValue(); else colorCode=Long.decode("0x000000").intValue(); return colorCode; } }
In the code above, the XWPFDocument (in POI library) is used to construct a Microsoft Word file. The object of this class accept the FileInputStream as its argument. The FileInputStream class is used to read the Microsoft Word file. When you have document that contains all data of the original Microsoft Word file, you can get the paragraphobjects inside the document by using the getParagraphs() method. This method returns all paragraphs found in the original Microsoft Word file. In each paragraph object, there are many smaller items called run objects. From each run object you can extract text, image, and formatting styles that are applied to the text when the Microsoft Word file is written.
Once you have the text, images, and formatting styles data, you can write them to the destination pdf file by using classes and methods from the iText library.
Once you have the text, images, and formatting styles data, you can write them to the destination pdf file by using classes and methods from the iText library.
Take your time to comment on our articles and code it will really help us improve a lot.
No comments:
Post a Comment