Results 1 to 13 of 13

Thread: Parse bug?

  1. #1
    gold plated 3D Chrusion's Avatar
    Join Date
    Mar 2003
    Location
    Chatannooga, TN
    Posts
    1,052

    Parse bug?

    I have a comma delimited file (CSV) in which a few cells have no data (nil). When using the parse method in my script, the returned token for that line is compressed. That is, parsing the line ",,2.5,5.1" results in the token array of "2.5,5.1,," An input of "101,,,404" results in "101,404,," etc.

    IOW, parse has a fatal flaw (maybe by design, but WHY?) in that it ignores nil cells and shifts the next non-nil cell in to the next available cell in the token array, instead of keeping it as per the data set with nil cells in between non-nil data. This malformation of the token array obviously breaks my script that checks for nil data that's expected to be in cells [1 and 2]. I have no control of the generation of the CSV file.

    How can this be worked around? Because, suppose a CSV file has 2.5 million lines and is generated from an on-line database search/retrieval system, how would you instruct someone getting such files to use in my script and how would they even know that the data needs to be edited to "fix" it when you have no idea who they are or when they will use the script? It would be far better to have a parser that simply leaves a token cell nil when nil data is found at each separator character and NOT shift non-nil values downward, compressing the token array.
    Last edited by Chrusion; 03-16-2017 at 11:28 AM.
    Dean A. Scott, mfa
    Senior 3D Animator and Graphic Design Illustrator, @ Astec, Inc.
    Owner / Lead Artist @ chrusion | FX

  2. #2
    TrueArt Support
    Join Date
    Feb 2003
    Location
    Poland
    Posts
    7,290
    According to
    ftp://ftp.newtek.com/products/LightW...tReference.pdf
    there are two parse() functions

    1st one

    parse
    parse()
    accepts two parameters: the first is a string of potential
    token separators and the second is the character string to process.
    This function returns an array of tokens that make up the string
    (second parameter).
    str = “23,45,69.6,100”;
    tokens = parse(“,”,str);
    // returns “23”, “45”, “69.9” and “100”


    2nd

    parse(string)
    parse(string)
    reads a line from the file, and returns a variable
    number of elements representing the tokens of the line. These
    elements are stored in a character array. Tokens are separated by
    one of the characters provided in the character string argument.
    This function is illegal in Binary mode.

    Which appears to be method of File Object Agent.



    Which one you're talking about and using.. ?

  3. #3
    gold plated 3D Chrusion's Avatar
    Join Date
    Mar 2003
    Location
    Chatannooga, TN
    Posts
    1,052
    Hmmm... thinking out loud here... what if I used the read() method to ingest the line as is, then do a regex search replace?

    I know nothing about regex, but a google search for replacing nil in CSV lines turned up this unix sed expression: s/^(?=,)|(?<=,)(?=,|$)/nil/g

    Lscript dox leave way too much out, so I have no idea how to implement this or even if it's valid. Any help out there?

    I came across Lscript 2.0 release notes at: http://lw9sdk.tribbeck.com/html/lscript/v20.html
    which say you can assign an expression to a variable like this: sr = r~s/^(?=,)|(?<=,)(?=,|$)/nil/g; where the r~ is the new regex operator that tells Lscript that what follows is an expression, BUT Lscript 2.6 barfs on the tilde ~. why? So what I'm left with is the regexp() command.

    Anyway, if I can do the "editing" within Lscript to replace the commas (with no preceding numerics) with the word "nil", then I can simply change the conditional test for !=nil to !="nil" after running the line through the parse method. The question is, is "s/^(?=,)|(?<=,)(?=,|$)/nil/g" valid and if not, how do you make it valid?
    Last edited by Chrusion; 03-16-2017 at 12:54 PM.
    Dean A. Scott, mfa
    Senior 3D Animator and Graphic Design Illustrator, @ Astec, Inc.
    Owner / Lead Artist @ chrusion | FX

  4. #4
    gold plated 3D Chrusion's Avatar
    Join Date
    Mar 2003
    Location
    Chatannooga, TN
    Posts
    1,052
    Quote Originally Posted by Sensei View Post
    According to
    ftp://ftp.newtek.com/products/LightW...tReference.pdf
    there are two parse() functions. .. Which one you're talking about and using.. ?
    I'm using the method (second one), not the command (first one). Thus CSVfile.parse(","); which reads in one line of the file and parses it at every comma.
    Last edited by Chrusion; 03-16-2017 at 12:45 PM.
    Dean A. Scott, mfa
    Senior 3D Animator and Graphic Design Illustrator, @ Astec, Inc.
    Owner / Lead Artist @ chrusion | FX

  5. #5
    gold plated 3D Chrusion's Avatar
    Join Date
    Mar 2003
    Location
    Chatannooga, TN
    Posts
    1,052
    Hmmm... I put the expression above through a regex validator and it modified it to: ^(?=,)|(?<=,)(?=,|$)

    So, I replaced it in the script and now Modeler crashes.

    code in question, contained in a conditional to test when the offending line number is reached, is:

    line = CSVfile.read();
    line = regexp("^(?=,)|(?<=,)(?=,|$)"); // replace ",," with "nil,nil," except when preceded by any other character or ",nil" when at the end of a line.
    error(line); // stop the script and print the value of line
    Dean A. Scott, mfa
    Senior 3D Animator and Graphic Design Illustrator, @ Astec, Inc.
    Owner / Lead Artist @ chrusion | FX

  6. #6
    gold plated 3D Chrusion's Avatar
    Join Date
    Mar 2003
    Location
    Chatannooga, TN
    Posts
    1,052
    The parse() command does the same thing (ignores nils, moving data down to next "empty' cell).

    in = CSVfile.read(); // line in = ",,303,404" or ",,,,505,,,,909"
    out = parse(",", in); line out = "303,404,," or "505,909,,,,,,,"
    Last edited by Chrusion; 03-16-2017 at 02:15 PM.
    Dean A. Scott, mfa
    Senior 3D Animator and Graphic Design Illustrator, @ Astec, Inc.
    Owner / Lead Artist @ chrusion | FX

  7. #7
    I think parse is working as expected. What would you expect it to return if nothing was there?

    You could make your own parser. Find the locations of the ',' and do string extractions based on that. You'd want to put in condition checks to check if there's actual data between the ',' locations. If there is no valid data, then insert what you want. I'm guessing you're trying to generate a 3 item array for (x,y,z) values.
    Last edited by ernpchan; 03-16-2017 at 02:21 PM.
    My opinions and comments do not represent those of my employer.
    www.ernestpchan.com
    www.zazzle.com/gopuggo

  8. #8
    gold plated 3D Chrusion's Avatar
    Join Date
    Mar 2003
    Location
    Chatannooga, TN
    Posts
    1,052
    Quote Originally Posted by ernpchan View Post
    I think parse is working as expected. What would you expect it to return if nothing was there?
    Lol... did you not read my posts? Nothing. I want NOTHING in those array cells. Why in the world would anyone want a data set all shifted out of place?

    You could make your own parser.
    Lol, again. I said that, too! :-) Yes, help to do that is why I'm here, if parse() really is moving my data around! Yes, I'm revisiting my Real 3D Stars Generator script and adding more robust data checks.
    Last edited by Chrusion; 03-16-2017 at 04:46 PM.
    Dean A. Scott, mfa
    Senior 3D Animator and Graphic Design Illustrator, @ Astec, Inc.
    Owner / Lead Artist @ chrusion | FX

  9. #9
    Quote Originally Posted by Chrusion View Post
    Lol... did you not read my posts? Nothing. I want NOTHING in those array cells. Why in the world would anyone want a data set all shifted out of place?

    Lol, again. I said that, too! :-) Yes, help to do that is why I'm here, if parse() really is moving my data around! Yes, I'm revisiting my Real 3D Stars Generator script and adding more robust data checks.
    Sorry, the thread got long pretty fast so I just fast forwarded to what I felt was an answer. Making your own parser would be the way to go. I've never run into a parser/split command return a null/placeholder value for what it didn't find.

    If you can do this in python it might be easier. Lscript gets pretty limited after awhile.
    My opinions and comments do not represent those of my employer.
    www.ernestpchan.com
    www.zazzle.com/gopuggo

  10. #10
    gold plated 3D Chrusion's Avatar
    Join Date
    Mar 2003
    Location
    Chatannooga, TN
    Posts
    1,052
    Thanks anyway, but it (python) ain't gonna happen and my programming and math skill with strleft, strsub, strright, and for loops and if/thens is entirely inadequate for this task. :-(
    Dean A. Scott, mfa
    Senior 3D Animator and Graphic Design Illustrator, @ Astec, Inc.
    Owner / Lead Artist @ chrusion | FX

  11. #11
    Geist im Maschine Gorbag's Avatar
    Join Date
    Jul 2003
    Location
    Lakewood, CO
    Posts
    39
    Quote Originally Posted by ernpchan View Post
    I think parse is working as expected. What would you expect it to return if nothing was there?

    You could make your own parser. Find the locations of the ',' and do string extractions based on that. You'd want to put in condition checks to check if there's actual data between the ',' locations. If there is no valid data, then insert what you want.
    Strictly script code like this in a stand-alone function would work:

    Code:
    parse_csv: str, sep
    {
        values = nil;
        val = "";
        for(i = 1;i <= str.size();++i)
        {
            if(str[i] == sep)
            {
                values += (val.size() == 0) ? nil : val;
                val = "";
            }
            else
                val += str[i];
        }
    
        if(val.size())
            values += val;
        else if(str[str.size()] == ",") // if it ends with a comma, nil value
            values += nil;
    
        return values;
    }
    For the input string ",,2.5,,5.1," the above LScript function would return [(nil), (nil), "2.5", (nil), "5.1", (nil)].

    Otherwise, you can wait for the next release. I've added an optional third argument Boolean to parse() (and an optional second argument to FileObject.parse()) that will 'include nil fields' in the array (like the LScript code above) if set to true. Otherwise, both will continue to function as they do today, stripping down fields to only those with substantial values.

  12. #12
    gold plated 3D Chrusion's Avatar
    Join Date
    Mar 2003
    Location
    Chatannooga, TN
    Posts
    1,052
    THANKS, Gorbag!!! I should have waited a little longer, but I got impatient and tried my VERY CRUDE hand at coding a parser. This is what I came up with, and it appears to work, but yours is SO MUCH more elegant! I will replace mine with yours, if that's OK. I was thinking of making mine into a function as well, but just went with the brute force way of things.

    Code:
    line = CSVfile.read();
    chrcnt = 0;
    idx = 1;
    for(j = 1; j <= size(line); j++) {
    	chr = strsub(line, j, 1);
    	if(chr != ",") { // no comma found, increment counter
    		chrcnt++;
    	} else { // comma found...
    		if(chrcnt == 0) { // ...and if no characters counted, then data must be absent. Set array cell = nil, increment array idx, and reset counter.
    			starData[idx] = "nil";
    			idx++;
    			chrcnt = 0;
    		} else { // otherwisem characters counted, so there is data. Set array cell to substr at start position and length.
    			starData[idx] = strsub(line, j-chrcnt, chrcnt);
    			idx++;
    			chrcnt = 0;
    		}
    	}
    }
    // put last cell of data into array.
    if(chr == ",") starData[idx] = "nil"; // last character of line is a comma, so last array cell has no data.
    else starData[idx] = strsub(line, j-chrcnt, chrcnt);
    Dean A. Scott, mfa
    Senior 3D Animator and Graphic Design Illustrator, @ Astec, Inc.
    Owner / Lead Artist @ chrusion | FX

  13. #13
    gold plated 3D Chrusion's Avatar
    Join Date
    Mar 2003
    Location
    Chatannooga, TN
    Posts
    1,052
    OK... next step, Gorbag... does your parse_csv() function go inside main{} or outside?

    Nevermind... it goes outside. Errors I was getting was due to feeding wrong variable to the function. Works now.
    Last edited by Chrusion; 03-17-2017 at 08:50 AM.
    Dean A. Scott, mfa
    Senior 3D Animator and Graphic Design Illustrator, @ Astec, Inc.
    Owner / Lead Artist @ chrusion | FX

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •