Re: Very slow Tcl C extension ...



On 23 mar, 09:49, Luc Moulinier <luc.moulin...@xxxxxxxx> wrote:
Hello All !

I hope you're going well in this sunny spring !!
I made an extension to compute correspondance between general and
local coordinates in aminoacids sequences of same length. My extension
should deliver two dictionaries, one for global -> seq coords, one for
seq -> global coords. I give to my extension a list made of :
NameOfSeq1 seq1 NameOfSeq2 seq2 ... etc.
Example :
Seq1 .A..CDE.FTW.
in CorrGS    in CorrSG
   0 0         0 1
   1 1         1 4
   2 1         2 5
   3 1         3 6
   4 2         4 8
Then [dict get $CorrGS Seq1 3] -> 1
     [dict get $CorrSG Seq1 3] -> 6

Here is the C code for that extension. Two things :
 - this code is extremely slow !! (seq length is 1000 - 20000 long,
and nbr seqs is 300 - 800)
 - when issuing a [dict keys $CorrGS], all keys are of the form
"Seq1,12" but I never add a comma anywhere ...

Some advices ? Many thanks !!
Luc

  TSG = Tcl_NewDictObj();
  TGS = Tcl_NewDictObj();

  /* catch the sequence length */
  sq0 = Tcl_GetByteArrayFromObj(Lseq[1],&slen);
  seq = (char *)ckalloc((slen+1) * sizeof(char));
  nom = (char *)ckalloc( 31 * sizeof(char));
  for (i=0;i<nseq;i+=2) {
    nom = Tcl_GetString(Lseq[i]);
    ONom = Tcl_NewStringObj(nom, -1);
    seq = Tcl_GetByteArrayFromObj(Lseq[i+1],&slen);
    Lsg[0] = ONom;
    Lgs[0] = ONom;

    OLgt = Tcl_NewStringObj("lgt",3);
    g = s = -1;
    Os = Tcl_NewIntObj(0);
    /* loop over the seq. g is general, s is seq pos */
    for (j=0;j<slen;j++) {
      g++;
      Og = Tcl_NewIntObj(g);
      if (seq[j] != '.') {
        s++;
        Os = Tcl_NewIntObj(s);
        Lsg[1] = Os;

        Tcl_DictObjPutKeyList(interp, TSG, 2, Lsg,Og);
      }

      Lgs[1] = Og;
      Tcl_DictObjPutKeyList(interp, TGS, 2, Lgs, Os);
    }
    Lgs[0] = ONom;
    Lgs[1] = OLgt;
    s++;
    Tcl_DictObjPutKeyList(interp, TGS, 2, Lgs,Tcl_NewIntObj(s));
    Tcl_DictObjPutKeyList(interp, TSG, 2, Lgs,Tcl_NewIntObj(s));
  }

  Res = Tcl_NewListObj(0,NULL);
  Tcl_ListObjAppendElement(interp,Res,TGS);
  Tcl_ListObjAppendElement(interp,Res,TSG);

  Tcl_SetObjResult(interp,Res);

I know it sounds like a recurring remark, but ... are you still sure
you need an extension ?
Considering the dict operations you're playing with (as opposed to
pure math computations), I'd guess your code will not be much faster
than the equivalent script, *plus* all the trickery of Tcl_Obj
lifecycle management (upon which you've already stumbled, as Donal
shows). Please give details about why [exec/open |] is not suitable.

-Alex
.



Relevant Pages