คุยกับผู้ใช้:Daimona Eaytoy

ไม่รองรับเนื้อหาของหน้าในภาษาอื่น
จากวิกิพีเดีย สารานุกรมเสรี

ยินดีต้อนรับสู่วิกิพีเดียภาษาไทย

ยินดีต้อนรับคุณ Daimona Eaytoy สู่วิกิพีเดียภาษาไทย หน้าต่อไปนี้อาจเป็นประโยชน์แก่คุณ:

มือใหม่ขอแนะนำอย่างยิ่งให้คุณเริ่มจากแก้หรือต่อเติมบทความที่มีอยู่แล้วก่อน ไม่ควรรีบสร้างบทความด้วยตัวเองเพราะมักไม่ผ่านและถูกลบ

แนะนำเว็บ

และ

เรียนรู้การแก้ไข (ขอใช้เวลาอ่านไม่นานเพื่อให้ทราบพื้นฐาน)

อีกทางหนึ่ง อ่านหน้า การเข้ามีส่วนร่วมในวิกิพีเดีย ซึ่งสรุปทุกอย่างไว้หน้าเดียว

ฉันอ่านหมดแล้วยังไม่เข้าใจเลย
ถามที่แผนกช่วยเหลือ หรือ ถามในหน้านี้แหละ! หรือ ใช้ แชตดิสคอร์ด

อย่าลืมลงชื่อในหน้าพูดคุย โดยการพิมพ์ --~~~~ จะปรากฏชื่อและวันเวลา

Hello Daimona Eaytoy! Welcome to Thai Wikipedia. If you are not a Thai speaker, you can ask a question in our Guestbook.


-- New user message (คุย) 21:40, 27 กุมภาพันธ์ 2561 (ICT)

Abusefilter[แก้]

Hey Daimona!

Not sure if you can remember me. We had a conversation a couple of weeks ago when you notified us that our filter is buggy (thanks again for that!). I'm also the user at Phabricator who just created and commented on a few tasks related to Abusefilter. More importantly, I just started a research project related to the Abusefilter language. I have a lot of questions about the language, and hope that you can help me out.

  • User variables seem to be case-insensitive, contradicting mw:Extension:AbuseFilter/Rules format#User-defined variables. Could you confirm that this is expected? (I wish it's case-sensitive though...)
  • What's the most efficient way to get filter information across all Wikimedia projects? In particular, is there a better way than querying through all API endpoints of all projects? (I have an access at WMCS FWIW).
  • Is there a way to apply as a global abusefilter helper?

By the way, I just finished formalizing and implementing the parser. I will dive into the semantics of the filter language next and will probably have a lot more questions & bug reports.

Also, let me know if I should communicate with you through another channel.

Best --Nullzero (คุย) 11:55, 1 พฤศจิกายน 2562 (+07)[ตอบกลับ]

@Nullzero: Hey! I'm very glad to hear about your interest in the AF language, I warmly appreciate it :-) To answer your questions:
  • Yes, all variables are case-INsensitive. I don't know why I wrote the opposite on mw.org, maybe I just got confused (anyway, updated now). The main reason vars are case-insensitive is that, in the early days of AF, variables were documented as uppercase, and the code also generated them uppercase (even though the parser strtolower'ed them). Hence, changing to case-sensitiveness would require making sure that no filter will break. However, given that (usually) filters will use only a handful of variables (or even none), it should be fine to keep them insensitive.
  • What kind of information would you need? Querying all API endpoints seems subpar; alternatively, you could query the replicas, using a big UNION across all DBs. Which isn't par, either. There are some scripts used to query all DBs, but I think you need production shell access for them. Unfortunately, I'm not aware of any alternative.
  • The place to do that is at m:Steward requests/Global permissions. I don't know whether there are specific requirements for that, though.
Again, thank you very much for your interest. Please feel free to ping me back, I'd be glad to answer your questions! --Daimona Eaytoy (คุย) 20:48, 1 พฤศจิกายน 2562 (+07)[ตอบกลับ]

Formal grammar and questions about built-in vars[แก้]

Thanks for the above response :)

I have two topics today: formal grammar and built-in variables.

Formal grammar[แก้]

Here's my formal (context-free) grammar of the language that I was talking about. Feel free to modify it and post it at mw:Extension:AbuseFilter/Rules format if you think it's useful.

expr : seq-expr

seq-expr : [seq-expr ";"] assign-expr [";"]

assign-expr : cond-expr
            | arr-append-expr
            | arr-assign-expr
            | var-assign-expr

arr-append-expr : IDENT "[" "]" ":=" assign-expr
arr-assign-expr : IDENT "[" expr "]" ":=" assign-expr
var-assign-expr : IDENT ":=" assign-expr

cond-expr : "if" boolbin-expr "then" expr "else" expr "end"
          | "if" boolbin-expr "then" expr "end"
          | boolbin-expr "?" expr ":" expr
          | boolbin-expr

boolbin-expr : [boolbin-expr ("|" | "&" | "^")] comp-expr
comp-expr : [sum-expr ("==" | "=" | "<" | ">" | "<=" | ">=" | "!=" | "===" | "!==")] sum-expr
sum-expr : [sum-expr ("+" | "-")] product-expr
product-expr : [product-expr ("*" | "" | "%")] expt-expr
expt-expr : [expt-expr "**"] not-expr
not-expr : ["!"] kw-expr
kw-expr : [neg-expr KEYWORD] neg-expr
neg-expr : [("-" | "+")] arr-acc-expr
arr-acc-expr : value-expr  | arr-acc-expr "[" expr "]"
value-expr : integer-expr
           | float-expr
           | string-expr
           | var-expr
           | arr-expr
           | "(" expr ")"
           | app-expr

app-expr : IDENT "(" [expr ("," expr)*] [","] ")"
integer-expr : INTEGER
float-expr : FLOAT
string-expr : STRING
var-expr : IDENT
arr-expr : "[" [expr ("," expr)*] [","] "]"
  • | means alternation
  • [...] means optional.
  • * means zero or more times (Kleene star).
  • Parens are for grouping.

Discussion[แก้]

Unlike the recursive descent parser which is currently in use, my parser which is generated from the above grammar is more lenient. For example, it doesn't check that IDENT in app-expr must be one of the built-in function names right away. That job belongs to the well-formedness checker, which will be run after parsing is done. In principle, this is better for the following reasons:

  1. It allows us to give a much better error message for things that are intuitively parsable but not well-formed rather than just "unexpected token".
    • Consider [lengt(arr)] and true := 1. The current parser fails with "expecting , or ] but found (" and "unexpected T_OP". The well-formedness checker could instead reports: "lengt is not a function" and "true can't be assigned", respectively.
    • Incidentally, my previous research group created an educational programming language for high school students and did a lot of study on error messages [1] [2]. It's been proven to us over and over again that parsing error is the most unhelpful kind of error.
  2. This allows more optimization. Parser must be run every time a filter is run, but well-formedness checker is needed to be run only once for each filter modification. So the separation reduces jobs that the parser needs to do at runtime.
    • In fact, the parsing cost could be avoided as well. E.g., for each filter modification, save a serializable AST, which is guaranteed to be well-formed, to the database in addition to the filter code. At runtime, we then only need to deserialize the AST, and this should be cheaper than parsing from scratch.
  3. it makes the grammar cleaner.

Incompatibility[แก้]

The only incompatibility I know of (assuming that we have correct well-formedness checks) is that this grammar allows trailing commas in function applications. It's easy to modify the grammar to restrict trailing commas to only variadic functions, as the current parser does. However, I will need to hard-code these variadic functions into the grammar, and I don't think it's worth doing that.

Built-in variables[แก้]

I'm going through each built-in variables in mw:Extension:AbuseFilter/Rules format right now. Here're my questions:

  1. As I understand, for "createaccount" action, only action, accountname, and timestamp are defined. Should the caution "... except for one case: account creation when the creator is not logged in. All variables starting with user_ are affected." be updated to include page_ (and article_) variables too?
    • What about "autocreateaccount" action, particularly w.r.t. page_ variables?
  2. The second caution "... will make any code using the variable in question return false." is not correct because it returns the undefined value, yes?
  3. Should the datatype of user_emailconfirm be "string/null"?
  4. Should the description of user_blocked be "... False otherwise."?
  5. The examine interface shows "false" value as empty. Is this a bug? Or is it intentional?
  6. It seems file_mime and file_mediatype are missing, correct?
    • Am I right that both has type string?
    • Is any of them nullable?
  7. The following are undocumented: moved_from/to_restrictions_*, moved_from/to_recent_contributors, moved_from/to_first_contributor. Do they have the exact same type as page_... variants?
  8. Should timestamp be moved to the "always available variable" table?
  9. moved_to_namespace and moved_from_namespace should be integer, right?
  10. Is accountname only available in "createaccount" action? What about "autocreateaccount"?
  11. Is it correct that global_user_groups, user_app, and user_mobile are available in every action except "createaccount" and "autocreateaccount"?

Thank you in advance! --Nullzero (คุย) 16:43, 3 พฤศจิกายน 2562 (+07)[ตอบกลับ]

@Nullzero: Wow, that's magnificent :D Thanks a lot for your work.
  • First of all, the grammar. I haven't checked whether it's correct (at a quick glance, it seems so), but in any case, I do think it would be useful to have it somewhere on mw.org.
  • Reply to "Discussion": yes, right now we do check that IDENT is a known function name. I think we could delay that to another phase, but only as long as we run through that phase when checking the syntax. It's vital for us to tell people that a filter is wrong at the very time they save it; not while it's executed. This means that we cannot just try to build the AST in order to validate a filter (i.e. when you press "check syntax", or try to save it); otherwise, some kind of errors (e.g. undeclared variables) would never be reported. However, this also means that we have to *fully* parse a filter just to check if it's "syntactically" valid. In this respect, I had opened phab:T231536. More specifically:
    1. Yes, our error messages suck. Any improvement in this area would be highly appreciated.
    2. We already do something like that. The current process is: when a filter is executed for the first time, we build the AST and save it in cache (for 24 hours), then we evaluate it with the current variables; every subsequent execution will only retrieve that AST and evaluate it with the new variables. We can, of course, add another layer (the well-formedness checker) between building the AST and caching.
  • As for the trailing commas incompatibility: currently, we do use a hardcoded list of variadic functions. There aren't many functions in AF, so it's pretty low-cost to do that. Of course, if it turns out to become expensive to be included in a well-formedness checker, we can allow trailing commas.
  • Answers to "Built-in variables":
    1. No, because page_ variables are never available for account creations. The special case mentioned in the warning only refers to account creations where the creator is not logged-in. In this case, you may expect user_ variables to be set (just like other account creations), but they aren't. This goes for both createaccount and autocreateaccount. For none of them page_ variables are available, regardless of whether the creator is logged in.
    2. Yes. The warning is very simplified and avoids mentioning the undefined type. In theory, the undefined value should behave exactly like an "immutable" false for an observer.
    3. Yes, thanks; updated.
    4. As of today, yes. I don't know whether in the past it could really be null, but I guess that we shouldn't care about that, so I've updated that line.
    5. It's just a graphical bug due to how PHP values are injected in the HTML (and also the lack of a <pre> element). See phab:T190653 and gerrit:421966.
    6. Yes, they're missing. I've added them, and AFAICS they cannot be nullable.
    7. Yes, added.
    8. Yes, thanks.
    9. Yes, updated.
    10. It's available for both createaccount and autocreateaccount.
    11. More precisely, they're available iff other user_ actions are available. This falls in the case discussed in (1.), i.e. they're always available except for account creations where the creator is not logged-in.
I hope I've answered anything! Thanks again, --Daimona Eaytoy (คุย) 20:45, 4 พฤศจิกายน 2562 (+07)[ตอบกลับ]

Well-formedness checker & more questions[แก้]

Again, thanks for your answers. I really appreciate for your help.

Conservative well-formedness checker[แก้]

Let me describe how the scope checking process, which is a part of well-formedness checking (a.k.a. Check Syntax), could detect unbound variables without hoisting. If you think this is a good idea and you want to implement it, feel free to. Otherwise, I will implement it, though it will take a while for me to get to this. Note that as a bonus, we will be able to remove all code related to hoisting away.

Let be a 3-place relation that intuitively means " are variables bound before evaluating , and are variables bound after evaluating ".

With that, here are syntax-directed en:inference rules that will allow you to derive if and only if doesn't have any unbound variables (with the current semantics of unbound variables).

  • These are the boring, housekeeping rules that don't really do anything useful:









  • These are interesting rules:
    • Vars must be bound
    • Assignments cause vars to be bound
    • Array assignments expect vars to be bound
    • Array appends expect vars to be bound
    • Conditionals propagate bound variables from both branches

The only cases I have not dealt with are set, set_var, &, and |. However, those can be rewritten to something that get handled already. E.g.,

  • rewrite set("<a>", <b>) to <a> := <b>
  • rewrite <a> & <b> to if <a> then (if <b> then true else false end) else false end

Then, use the rules above. This completes the scope checking process.

Algorithm[แก้]

We can convert the above rules to an algorithm: let be a function which either computes a set of variables that are bound after the input AST node is evaluated, provided that the input set of variables are already bound, or errors because it detects unbound variables. Here's a pseudocode with a few example cases: variables, conditionals, and sequences.

function boundAfter(e, B):
  if e is a variable and e is not built-in:
    when e is not in B: raise exception end
    return B
  else if e is a conditional:
    c, t, e = e.children
    boundAfterC = boundAfter(c, B)
    return setUnion(boundAfter(t, boundAfterC), boundAfter(e, boundAfterC))
  else if e is a sequence:
    l, r = e.children
    return boundAfter(r, boundAfter(l, B))
  .
  .
  .
  end
end

boundAfter(rootAST, emptySet)

Advantages[แก้]

The advantage of this scope checking is that it guarantees the entire syntax checking. The current Check Syntax attempts to achieve the same goal, but it needs various hacks on top of the existing interpreter that is not meant to perform static analysis, and it is full of bugs (as you are well aware of).

The scope checking process is very fast, and it is done at Check Syntax time. It should be completely equivalent to what the current Check Syntax is trying to do regarding unbound variables, except that it should be much more straightforward. It is also easy to piggyback other static checks, like the ones for phab:T234339 in this phase.

Liberal well-formedness checker[แก้]

In the above scope checking, if a filter has no error re: unbound variables (i.e., all variables are bound), then it will definitely pass the checking process. That is, the checking system is complete. However, as you know, it actually does allow unbound variables to some extent, admitting the program false & (a := true); a even though a could be considered unbound in the strict sense. So one could say that the system is unsound.

There's something interesting about the above inference rules. By flipping set union to set intersection in the conditionals case, we will get a sound but incomplete system. Every filter that passes the check will definitely have no unbound variables. However, it rejects the following filter (false & (a := 1)) & a which has no error (although it totally makes sense if you are willing to accept phab:T234690).

I tried both variants on all public filters in all Wikimedia projects (via the API, unfortunately). The conservative variant accept every filter (which totally makes sense—otherwise either my checking or the current checking process is really buggy!), while the liberal variant rejects a tiny fraction of filters (though there are a few).

Having a sound system means that we can reject any program with unbound variables. It is a step toward removing the undefined value.

Sound & complete checker[แก้]

Sound & complete checker should be possible, as the filter language is not Turing complete. However, it will be much more difficult. That's what I'm working on right now.

Questions[แก้]

  1. What's your opinion on the variable TRUE? The error message says something about unbound "true", which I guess could be confusing.
  2. What's your thought on how true+true evaluates to 2.0 and not 2? The advantage of casting everything to float is that you don't need to worry if "1" or "1.23" should be casted to integer or float—just cast it to float! But it's also counterintuitive IMO to cast things that can be integer to float.
  3. What's your opinion on how (a := [1]) | (a := [1, 2]); a[1] passes Check Syntax? Should this be allowed?
  4. Why is there an expiration of the cache? It should not take that much space per filter, right?

Regards, --Nullzero (คุย) 03:33, 7 พฤศจิกายน 2562 (+07)[ตอบกลับ]

Hey! First of all, I'm not so expert about formal grammars etc., so I probably won't be able to provide definitive answers. That said, implementing a scope checker is definitely something that I want to do at some point. Removing hoisting-related code would also be nice! I don't know when I'll have time to do that, but any help from you would be highly appreciated. I think I understood the properties of (un)sound & (in)complete checkers, although I don't know which one should be preferred - sound but incomplete IIUC, huh? Unless a sound & complete checker can exist (as you said), which I guess would completely eliminate the downsides.
As for your questions:
  1. TRUE should IMHO be case-insensitive, like e.g. in PHP. We already allow insensitive var names, so I don't see any reason to have case-sensitive identifiers.
  2. I agree with you. Previously, the parser used to cast *everything* to float, even integers! I had mitigated it with this commit last year. What we should do is changeAFPData::toNumber to return an integer in other cases, e.g. for bools (or null). It should be pretty easy.
  3. I think it should not be allowed because the element at [1] is not guaranteed to exist. Although I guess it could be pretty hard to understand what's going on from within the syntax checker.
  4. I don't know for sure, I think it's just a "safety measure". There are plans to improve the caching a lot, see e.g. the TODO at phab:T234427.
Thank you again, --Daimona Eaytoy (คุย) 18:05, 7 พฤศจิกายน 2562 (+07)[ตอบกลับ]
Yet again, thanks for your response, and I highly appreciate your endorsement of my request at metawiki. :) --Nullzero (คุย) 23:28, 7 พฤศจิกายน 2562 (+07)[ตอบกลับ]

A couple of other questions[แก้]

Hi Daimona,

I'm curious how you detect broken filters, including the one you reported to us. Did you find it from an error log somewhere? If not, how thorough is this detection?

I just implemented another tool that would similarly detect this kind of mistake statically. It detects a buggy filter created in August in srwiki. So I would like to understand why your tool did not catch this mistake, and whether there's anything I could help --Nullzero (คุย) 08:00, 13 พฤศจิกายน 2562 (+07)[ตอบกลับ]

@Nullzero: Hey! I saw them logged in WMF logstash; I usually add some logging to the code when I want to make a breaking change, and see what filters are reported. Spotting "broken" filter is then relatively easy, but not even close to static analysis in terms of efficiency :) --Daimona Eaytoy (คุย) 20:31, 13 พฤศจิกายน 2562 (+07)[ตอบกลับ]

Patchset[แก้]

Hello!

Thanks so much for your review. I believe I addressed most of your feedback. Let me ask you a few additional questions however.

Running PHPUnit[แก้]

It would be really helpful for me if I can run PHPUnit locally on my computer. So far, my attempts have been a failure. For instance, when running composer phpunit at the root of my MediaWiki installation, I got:

/Applications/MAMP/htdocs/core ❯❯❯ composer phpunit
> phpunit

Fatal error: Cannot declare class AFPDataTest, because the name is already in use in /Applications/MAMP/htdocs/core/extensions/AbuseFilter/tests/phpunit/Unit/AFPDataTest.php on line 239
Script phpunit handling the phpunit event returned with error code 255

Is this the correct way to run PHPUnit? If so, do you know how to fix this problem?

Jenkins bot[แก้]

Do you know why Jenkins bot is being a jerk unresponsive to me? How do I make it run? It used to work fine for me (though that's several years ago).

Thanks again, --Nullzero (คุย) 11:36, 21 พฤศจิกายน 2562 (+07)[ตอบกลับ]

Hey! Thank you for your contributions! I'll try to review the patch further as soon as I have some time. To answer your questions:
  • For PHPUnit, I'm assuming that you're not using an IDE like PHPStorm. If my assumption is correct, it depends on what tests you need. I'm also assuming that you're interested in parser tests. MW core has a wrapper, composer phpunit:unit, to easily run unit tests. Since you're only interested in a single test, you could use it like cd path/to/core && composer phpunit:unit path/to/AbuseFilterParserTest.php. The same goes for other tests in the unit/ folder. Otherwise, it depends. For instance, if you want to run *all* AF tests (but you usually don't need to do that for parser changes), you can use cd path/to/core && php tests/phpunit/phpunit.php --group AbuseFilter.
  • As for jenkins-bot (or, as it's known when voting -1, jerkins-bot), there's a whitelist of users who will automatically trigger its checks. You have to be added to that whitelist, which I'm going to do soon.
Thank *you* again, --Daimona Eaytoy (คุย) 17:43, 21 พฤศจิกายน 2562 (+07)[ตอบกลับ]

Toward removing DUNDEFINED at runtime[แก้]

Re: above, I got PHPUnit working now! Somehow composer doesn't work for me at all, presumably because I have several versions of PHP on my computer. Your php tests/phpunit/phpunit.php --group AbuseFilter on the other hand works perfectly once I replace php with /Applications/MAMP/bin/php/php7.3.8/bin/php. And thanks for adding me to the whitelist!

As I have expressed for many times, I really dislike DUNDEFINED at runtime. It is unintuitive and has weird behaviors E.g., (a === b | a !== b) doesn't evaluate to true.

DUNDEFINED could arise in two instances at runtime.

  1. From "hoisted" user-defined variable. If AFPSyntaxChecker's liberal mode is used, then this instance of DUNDEFINED will be gone. (So I do hope that we will eventually move to that mode).
  2. From unavailable built-in variables. This is the one I want to focus in this conversation.

The current Check Syntax, albeit not perfect, can somewhat be modified to detect the second instance of DUNDEFINED. The weird thing is that, to detect patterns that could result in DUNDEFINED at runtime, we will use DUNDEFINED at compile-time! Check Syntax is needed to be modified so that when the conditionals is non-DUNDEFINED value, then it will pick only one branch to evaluate. Otherwise, it evaluates both branches. Note that the current implementation of Check Syntax is buggy because it doesn't evaluate both branches correctly, as I showed you in the previous conversation.

Now, suppose we have a version of Check Syntax that satisfies the above requirement. Let be the set of all possible actions. For each , run the current Check Syntax with action setting to , and all other built-in variables are either (1) set to DUNDEFINED, provided that it's available when action is ; or (2) not set at all, otherwise.

If there exists such that Check Syntax says that a built-in variable is unbound, then we can warn (or even prevent) users that in action , the built-in variable is potentially undefined. I used the word "potentially" because it's not super accurate. Consider:

if action !== 'upload' then
  if (timestamp !== timestamp + 1) then null else file_size end
end

The checker will say that, when is 'edit', file_size is potentially unbound. However, this is in fact safe because timestamp !== timestamp + 1 is a tautology, making file_size unreachable. It's just that it's kinda hopeless for computer to prove that timestamp !== timestamp + 1 is a tautology, so we need to approximate it with DUNDEFINED, causing the inaccuracy.

On the other hand, if the algorithm indicates that a pattern has no unbound built-in variable, then it indeed has no such error!

(It might also be worth renaming DUNDEFINED to DSYMBOLIC at that point. What I describe above is a poor man's en:symbolic execution engine).

--Nullzero (คุย) 20:53, 21 พฤศจิกายน 2562 (+07)[ตอบกลับ]

@Nullzero: Re PHPUnit: you can also specify the php executable for composer (i.e. path/to/php composer phpunit...), dunno if it will work though. As for whitelisting, someone has to approve, but that usually happens pretty quickly. As for DUNDEFINED, I also don't like it. However, I had to implement it as a quick (and hackish) solution for various underlying bugs. The sooner we can get rid of it, the better! Also, for what concerns the two instances you mentioned: (1.) is the bad one IMHO, and maybe the easier to get rid of; (2.) is where we *really* need some safety-net like DUNDEFINED is (the reason is still the same: an incautious filter editor writing the rule edit_delta < -50000 and setting it to disallow will effectively disallow all non-edit actions -- and this is just a common example). That said, I like the idea you've outlined above, of running the syntax checker multiple times for all actions, and reporting all potentially unavailable variables. I also think we shouldn't worry about tautologies; I believe that sort of "errors" aren't seen very often, and emitting an error if we find one has its benefits. I'm afraid I won't be able to implement such a feature (definitely not soon), but I'd be glad to review a patch of yours. Unrelated, it would be cool if we could report some kind of errors at runtime with javascript validation -- it would also enable us to report "non-blocking" issues (i.e. warnings) for non-serious stuff. There's a task about that (T187686), but unfortunately I couldn't find an easy way to implement it (mostly because Ace uses JS workers and I'm by no means an expert with that). --Daimona Eaytoy (คุย) 22:17, 21 พฤศจิกายน 2562 (+07)[ตอบกลับ]
I'm not a fan of the Ace Editor unfortunately. It doesn't work nicely at all for non-English scripts. I believe all sysops at Thai Wikipedia (especially me!) have suffered through this problem. Luckily, you did include a button to switch to the classic editor, so things are not that bad. Note that the GitHub issue was opened since 2011, so I don't have much hope of this getting fixed. --Nullzero (คุย) 22:38, 21 พฤศจิกายน 2562 (+07)[ตอบกลับ]
@Nullzero: Heh, well, it has its downsides. I've been hitting that bug as well, when dealing with "wild" characters, like these ones -- basically, any character with a non-fixed width. Unfortunately, I'm not aware of suitable alternatives, especially because Ace is what's used by MW's CodeEditor extension. --Daimona Eaytoy (คุย) 22:59, 21 พฤศจิกายน 2562 (+07)[ตอบกลับ]

By removing variable hoisting:

a := [1];
false & (a[] := 2);
a[0] === 1

will now evaluate to true, whereas it's currently false. I think the new behavior makes more sense, and I could not understand the rationale of the current behavior at all. Could you explain why it's currently doing that, in case I miss anything?

By the way, I just realize that my old email is whitelisted as I'm an (inactive) Pywikibot maintainer. I no longer use that email however, so I do need to get my new email whitelist. So thanks again for your help! --Nullzero (คุย) 00:28, 23 พฤศจิกายน 2562 (+07)[ตอบกลับ]

@Nullzero: Sure, it fine that it's returning true. The current behaviour is for cases like: a := [1]; a[DUNDEFINED] := 5; a[0] === 1. This should return false, because DUNDEFINED could potentially be 0. As for the whitelist, I'm sure someone will pick it up soon. In the meanwhile, I'll keep triggering jenkins manually. --Daimona Eaytoy (คุย) 00:43, 23 พฤศจิกายน 2562 (+07)[ตอบกลับ]

Reply and another question[แก้]

I'm slightly lost, so please correct me if I'm wrong.

From this line, it says that during a short-circuit, for each variable in the branch that is not evaluated:

  1. if it's not already set, then set it to the undefined value. This one makes perfect sense.
  2. if it's set but the current value is an array, reassign the variable to the undefined value. This is the one that I'm very confused about.

There are two modes of evaluation: Check Syntax, which disables short-circuiting, and actual evaluation, which enables short-circuiting. Therefore, hoisting only occurs in the actual evaluation. Note that in the actual evaluation, we always know which branch we should take, since we have a concrete variable environment. What I am confused then is that, since we know exactly which branch we are going to take, why does assignment statements in the branch that we do not take affect us at all?

The issue about a := [1]; a[DUNDEFINED] := 5; a[0] === 1 is different because it is a straight-line code which has no hoisting. a[0] === 1 is undefined because of the array assignment operation, not because of hoisting.

To look at this at another angle, compare: a := 1; (false & (a := 2)); a with a := [1]; (false & (a := 2)); a. They are the same in every way except the initial value of a. Yet, one evaluates the undefined value and the other does not. I think this is very weird.

I also have another question. What I require to implement the proposed algorithm (that would eliminate the need of DUNDEFINED completely) is a map from action to a set of built-in variables that are bound under that action. For example:

[
  'edit' => ['timestamp', 'added_lines', ...],
  'upload' => ['timestamp', 'file_size', ...],
  ...
]

How can I get this map? Note that there are multiple extensions that add more actions and built-in variables to the filter languages, and different wikis have different extensions enabled, so it's not possible to hard code this data. I need a reliable, automated way to query this information.

By the way, I commented at meta:Requests_for_comment/Creating_abusefilter-manager_global_group. I unequivocally support it as the group is necessary to get various pending work done. Let me know if there's anything I can help. Do I understand correctly that the current result of this RfC is not looking good?

Thanks --Nullzero (คุย) 14:11, 29 พฤศจิกายน 2562 (+07)[ตอบกลับ]

@Nullzero: For what concerns UNDEFINED in arrays: that line is very raw, in that it doesn't know how the array is modified in the inner nodes. All it knowns is that it will be modified, hence I decided to err on the side of safety, and reset the whole array. Basically, the only reason for it is to avoid having to keep track of how inner assignments modify arrays.
As for the map you're asking about, currently there's no way to get that. As you noted, extensions can add further variables, so it's not easy to get this data. Moreover, it's not guaranteed that the same action will *always* have the same variables (e.g. the example we talked about above in this page about user_ variables not being set for account creations, iff the creator is anon). To get such a map, we'd first need to enforce that every action always has the same variables, both in AF and other extensions. It shouldn't be super-hard, but not straightforward either. Then building the map would come as a bonus.
Lastly, the meta RFC. Unfortunately, there doesn't seem to be a clear consensus about that. If we need very urgent fixes we can ask a sysadmin, but that's definitely not optimal, and IMHO less transparent. --Daimona Eaytoy (คุย) 21:40, 29 พฤศจิกายน 2562 (+07)[ตอบกลับ]