How to extract binary files Powerpoint, PDF etc.) from Oracle Portal

From: Ben Engbers (J.B.O.M.Engbers_at_ln.vnlnim.R-e-v-e-r-s-e.D-o-m-a-i-n)
Date: 09/21/04


Date: Tue, 21 Sep 2004 12:54:56 +0200

hi,

I need to extract all the content that is stored in an Oracle Portal
database (8.1.xxx) And save it to a file system.

The first step is to extract all the information on the items from the
table portal30.wwv_things and portal30.wwv_corners.
After some transformations, I can construct a call to CopyByteItem.copy()
The code for this method is as follows:

   InputStream is;
   public void copy(int ID, String wd, String inputFile, String
outputFile) throws IOException {
     String tmpUrl, tmpName;
     tmpUrl = wd.substring(0,wd.lastIndexOf("/") +1 );
     tmpName =
URLEncoder.encode(wd.substring(wd.lastIndexOf("/")+1,wd.length()),"UTF-8");
     URL url=new URL(bronSite+tmpUrl+tmpName);
     System.out.println(url.openConnection().getContentType());
     // Create inputstream
     is = url.openStream();
     BufferedInputStream bis = new BufferedInputStream(is);
     // Create outputstream
     File outPut = new File(outputFile);
     FileOutputStream out = new FileOutputStream(outPut);

     synchronized (bis) {
       synchronized (outPut) {
         byte[] ins = new byte[256];
         for (;;) {
           int i = bis.read(ins);
           if (i == -1)
           break;
           out.write(ins, 0, i); } } }
     is.close();
     out.close();
   }

When I issue these 4 following testcalls, the first two calls give me
the desired output.
try {
   CopyByteItem.copy(73937,
"/docs/Folder/DLG_INTRANET/DLG_HOME/WERKVELDEN/DIENSTBREDE_PROJ/OESTERS_EN_PARELS/AESHIPDI.HTML",
"AESHIPDI.HTML",outputDir+"10157.HTML");
   CopyByteItem.copy(182520,
"/docs/Folder/DLG_INTRANET/NOORD_BRABANT_HOME/WERKVELDEN/PROV_PROJECTEN/RECONSTRUCTIE/RECONSTRUCTIEBULLETINS/PERSBERSTUWENI.DOC",
"PERSBERSTUWENI.DOC",outputDir+"42744.DOC");
CopyByteItem.copy(24254,"/docs/Folder/DLG_INTRANET/DLG_HOME/WERKVELDEN/RESERVE/GEOPLAZA_OUD/GEBRUIKEN/GIS
IN PROJECTEN.PDF","GIS IN PROJECTEN.PDF",outputDir+"1606.PDF");
CopyByteItem.copy(182590,"/docs/Folder/DLG_INTRANET/NOORD_HOLLAND_HOME/WERKVELDEN/REGELINGEN/POPNH/PP
PRESENTATIE DLG NH OKT 2003.PPT","PP PRESENTATIE DLG NH OKT
2003.PPT",outputDir+"42784.PPT");
} catch(IOException e) {
   System.out.println("Probleem bij het copieren: "+e.getMessage());
}

System.out.println(url.openConnection().getContentType()) results in
text/html and application/octet-stream.

The second two calls however produce output similar to this one:
<HTML>
<BODY bgColor="#FFFFFF" onLoad="document.LoginForm.submit();">
<SCRIPT LANGUAGE="JavaScript">
function show_context_help(h) {
     newWindow = window.open(h,"ContextHelp",
"menubar=1,scrollbars=1,resizable=1,width=600, height=400");
}
</SCRIPT>
<FORM
ACTION="http://>/pls/portal30_sso/portal30_sso.wwsso_app_admin.ls_login"
METHOD="POST" name="LoginForm">
<INPUT TYPE="hidden" NAME="site2pstoretoken"
VALUE="v1.1~1321~37D032EEA1FD88FFBE1FF8591FB4C1A27141BADA2D7997CD1DF2EC87860A6872CD966A6FCABE1C1BC17044DA0AB0EE7663AB8299311EB6B3193BE5AE5AF02225CD5CFF23980733834426B29B4833294C8A6BDEF1513391E5928EEB03235BAC0561E76FDE0A6E7223349A1DFA756DDDDFEBDCA044E0042ED763BC6AD56108D54846B57600E288A59E6890404BC097F6D9609E3E046D8F874BEBFF6D70A8E59FD3FA73B1C787F691E5298AB03D2787F1A73B2911783BF42B8CC92AB95D69FAFF676A671405C17166F591E70A376D082854274C8800BD77AA85F8C870B06431B48495E1ACA1878611EC162FC6BA96C26C240FFC25544852B2BF7499C4CE1F2930442CE35157321782C8DBE942F9999C37F99BD754DFD41EC09733C6771657DE12FBEBEF42E7C8A7244F">
</FORM>
</BODY>
</HTML>

and System.out.println(url.openConnection().getContentType()) now
results in text/html.

I guess that the MIME-type is not set. (I don't know if this can be done
by the client).

Does anybody know how I can extract all the content from the Oracle
database without having to set manually the mime-type for each item?

Thanks,
Ben



Relevant Pages