Bug 57543 - -webkit-visual-word does not work on words separated by multiple spaces
Summary: -webkit-visual-word does not work on words separated by multiple spaces
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: HTML Editing (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks: 25298
  Show dependency treegraph
 
Reported: 2011-03-31 06:40 PDT by Xiaomei Ji
Modified: 2011-05-20 10:45 PDT (History)
7 users (show)

See Also:


Attachments
patch (8.17 KB, patch)
2011-05-02 18:47 PDT, Xiaomei Ji
no flags Details | Formatted Diff | Diff
patch w/ layout test (27.84 KB, patch)
2011-05-03 17:08 PDT, Xiaomei Ji
no flags Details | Formatted Diff | Diff
patch (27.85 KB, patch)
2011-05-04 23:20 PDT, Xiaomei Ji
no flags Details | Formatted Diff | Diff
patch w/ layout test (27.22 KB, patch)
2011-05-10 14:16 PDT, Xiaomei Ji
no flags Details | Formatted Diff | Diff
patch w/ layout test (28.33 KB, patch)
2011-05-10 16:42 PDT, Xiaomei Ji
no flags Details | Formatted Diff | Diff
patch w/ layout test (28.27 KB, patch)
2011-05-11 11:36 PDT, Xiaomei Ji
no flags Details | Formatted Diff | Diff
patch w/ layout test (27.90 KB, patch)
2011-05-16 11:20 PDT, Xiaomei Ji
rniwa: review-
Details | Formatted Diff | Diff
patch w/ layout test (26.96 KB, patch)
2011-05-17 16:02 PDT, Xiaomei Ji
rniwa: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Xiaomei Ji 2011-03-31 06:40:42 PDT
the algorithm looks for words in visual order by looping though inline boxes in visual order and stops at the first word break found.

In the case of multi-spaces, word break among the collapsed spaces might not be added.
 For example, "<div>abc    def hij    opq rst</div>", no word break is added between "hij" and "opq".
    The reason is when cursor is immediately to the right of "opq", its previous word boundary's offset returned from previousWordPosition() is 15,
 which is immediately to the right of "hij ", and which is not in the same box as "opq",
  so, it wont be added as a word break in box "opq rst".
   To solve this, we could always add the rightmost(for LTR box)/leftmost(for RTL box) boundary
     as wordbreak when its inlineBox is the same as 'box'. So, a word break between "hij" and "opq" will be added when we traverse
   box "def hij   ". Checking that the rightmost or leftmost boundary's inlineBox get from
    getInlineBoxAndOffset is the same as 'box' is important as to not break cases such as <div>he<b>llo</b> world</div>, in which
   when cursor is immediately to the right of 'he', the rightmost boundary's inlineBox get from getInlineBoxAndOffset is
   box 'llo', which is different from box 'he', and this rightmost boundary wont be added as a word break.
     But it still might break some other cases since a boundary of a box does not necessarily mean a word break.
   The ultimate solution is to provide a way not to canonicalize(upstream) VisiblePosition in its constructor, 
 so previousWordPosition() of "opq" could return offset 18, which is immediately to the left of "opq", instead.
Comment 1 Xiaomei Ji 2011-04-05 09:31:46 PDT
This is for the algorithm implementing moving caret by word in visual order, which is bug 25298 is for.
It is not about the current implementation which moves caret by word in logical order.
Comment 2 Xiaomei Ji 2011-05-02 18:47:12 PDT
Created attachment 92033 [details]
patch

work in progress.
Comment 3 Xiaomei Ji 2011-05-03 17:08:00 PDT
Created attachment 92173 [details]
patch w/ layout test
Comment 4 Early Warning System Bot 2011-05-03 17:17:13 PDT
Attachment 92173 [details] did not build on qt:
Build output: http://queues.webkit.org/results/8547016
Comment 5 WebKit Review Bot 2011-05-04 15:34:51 PDT
Attachment 92173 [details] did not pass chromium-ews:
Output: http://queues.webkit.org/results/8552699
Comment 6 Xiaomei Ji 2011-05-04 23:20:36 PDT
Created attachment 92380 [details]
patch

fix compilation error in qt and chromium.
Comment 7 Ryosuke Niwa 2011-05-06 17:39:43 PDT
Comment on attachment 92380 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=92380&action=review

> Source/WebCore/editing/visible_units.cpp:1160
> +static VisiblePosition visuallyFirstWordBreakInBoxInsideBlockWithSameDirectionality(const InlineBox* box, const VisiblePosition& wordBreak, int& offsetOfWordBreak) 

I'm not sure if "visuallyFirstWordBreak" is descriptive here.  How about something like beforeVisuallyFirstWordInBoxInsideBlockWithSameDirectionality?

Also, it may not be worth adding this function at all since it's called at exactly one place.  Can we just put this all in previousWordBreakInBoxInsideBlockWithSameDirectionality?

> Source/WebCore/editing/visible_units.cpp:1162
> +    VisiblePosition nextWord = nextWordPosition(wordBreak);

Nit: s/nextWord/positionAfterCurrentWord/

> Source/WebCore/editing/visible_units.cpp:1163
> +    VisiblePosition wordPosition = previousWordPosition(nextWord);

Nit: s/wordPosition/positionBeforeCurrentWord/

> Source/WebCore/editing/visible_units.cpp:1186
> +        && ((box->isLeftToRightDirection() && box->nextLeafChild()) 
> +        || (!box->isLeftToRightDirection() && box->prevLeafChild()))) {

I'm not following here.  Shouldn't we be checking box->isLeftToRightDirection() && !box->prevLeafChild() instead?

> Source/WebCore/editing/visible_units.cpp:1296
> +    VisiblePosition wordBreak = Position(box->renderer()->node(), box->caretMaxOffset(), Position::PositionIsOffsetInAnchor);
> +
> +    VisiblePosition nextWord = previousWordPosition(wordBreak);
> +    wordBreak = nextWordPosition(nextWord);

Ditto about these variable names.
Comment 8 Xiaomei Ji 2011-05-09 10:54:38 PDT
(In reply to comment #7)
> (From update of attachment 92380 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=92380&action=review
> 
> > Source/WebCore/editing/visible_units.cpp:1160
> > +static VisiblePosition visuallyFirstWordBreakInBoxInsideBlockWithSameDirectionality(const InlineBox* box, const VisiblePosition& wordBreak, int& offsetOfWordBreak) 
> 
> I'm not sure if "visuallyFirstWordBreak" is descriptive here.  How about something like beforeVisuallyFirstWordInBoxInsideBlockWithSameDirectionality?

It actually is the visually first word break in the box (if it is a word break inside the same box).

> 
> Also, it may not be worth adding this function at all since it's called at exactly one place.  Can we just put this all in previousWordBreakInBoxInsideBlockWithSameDirectionality?

I do not have strong objections. But separating it out seems more consistent with the current style (making functions more fine grained and more readable), for example, lastWordBreakInBox() is also only used once.

> 
> > Source/WebCore/editing/visible_units.cpp:1186
> > +        && ((box->isLeftToRightDirection() && box->nextLeafChild()) 
> > +        || (!box->isLeftToRightDirection() && box->prevLeafChild()))) {
> 
> I'm not following here.  Shouldn't we be checking box->isLeftToRightDirection() && !box->prevLeafChild() instead?
> 

The word break collecting here is actually the rightmost word break for a LTR box or leftmost word break for a RTL box.  Given LTR box as an example, we do not need to save its rightmost word break if it is the visually rightmost box (has no nextLeafChild).
Comment 9 Ryosuke Niwa 2011-05-09 15:14:50 PDT
Comment on attachment 92380 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=92380&action=review

>>> Source/WebCore/editing/visible_units.cpp:1186
>>> +        || (!box->isLeftToRightDirection() && box->prevLeafChild()))) {
>> 
>> I'm not following here.  Shouldn't we be checking box->isLeftToRightDirection() && !box->prevLeafChild() instead?
> 
> The word break collecting here is actually the rightmost word break for a LTR box or leftmost word break for a RTL box.  Given LTR box as an example, we do not need to save its rightmost word break if it is the visually rightmost box (has no nextLeafChild).

But your condition says box->isLTR() AND box->nextLeafChild().  Under this condition, the box is not the rightmost box.
Comment 10 Ryosuke Niwa 2011-05-09 15:15:07 PDT
(In reply to comment #8)
> (In reply to comment #7)
> > (From update of attachment 92380 [details] [details])
> > View in context: https://bugs.webkit.org/attachment.cgi?id=92380&action=review
> > 
> > > Source/WebCore/editing/visible_units.cpp:1160
> > > +static VisiblePosition visuallyFirstWordBreakInBoxInsideBlockWithSameDirectionality(const InlineBox* box, const VisiblePosition& wordBreak, int& offsetOfWordBreak) 
> > 
> > I'm not sure if "visuallyFirstWordBreak" is descriptive here.  How about something like beforeVisuallyFirstWordInBoxInsideBlockWithSameDirectionality?
> 
> It actually is the visually first word break in the box (if it is a word break inside the same box).

But what kind of word break is it?  Is that before a word or after a word?  It's not clear from the function name.

> > 
> > Also, it may not be worth adding this function at all since it's called at exactly one place.  Can we just put this all in previousWordBreakInBoxInsideBlockWithSameDirectionality?
> 
> I do not have strong objections. But separating it out seems more consistent with the current style (making functions more fine grained and more readable), for example, lastWordBreakInBox() is also only used once.

The fact we need such a long function name might imply that this shouldn't really be a function.
Comment 11 Xiaomei Ji 2011-05-09 15:50:19 PDT
Comment on attachment 92380 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=92380&action=review

>>>> Source/WebCore/editing/visible_units.cpp:1160
>>>> +static VisiblePosition visuallyFirstWordBreakInBoxInsideBlockWithSameDirectionality(const InlineBox* box, const VisiblePosition& wordBreak, int& offsetOfWordBreak) 
>>> 
>>> I'm not sure if "visuallyFirstWordBreak" is descriptive here.  How about something like beforeVisuallyFirstWordInBoxInsideBlockWithSameDirectionality?
>>> 
>>> Also, it may not be worth adding this function at all since it's called at exactly one place.  Can we just put this all in previousWordBreakInBoxInsideBlockWithSameDirectionality?
>> 
>> It actually is the visually first word break in the box (if it is a word break inside the same box).
> 
> But what kind of word break is it?  Is that before a word or after a word?  It's not clear from the function name.

in this case (box and block have the same directionality), it is the position before next word or it could be the position before current word in the case that the boundary of the box is not a word break boundary.
For example: <div>opq <span>abc<b>def </b> hij</span>, when box is "abc", the first visual word break is position before "abc" (position before current word). when box is "def", the first visual word break is position before "hij" (if this position is in the same inlineBox as "def"), which is a position before next word.

We did not expose whether the word break is before a word or after a word in the function names of ....WithSameDirectionality() and ....WithDifferentDirectionality() and their callee.
Inside previousWordBreakInBoxInsideBlockWithSameDirectionality() and nextWordBreakInBoxInsideBlockWithDifferentDirectionality(), there is comment saying where the word break should be, those functions and their callees follow the rule.

If you feel we should put those functions inside their caller, I can do that.

>> Source/WebCore/editing/visible_units.cpp:1162
>> +    VisiblePosition nextWord = nextWordPosition(wordBreak);
> 
> Nit: s/nextWord/positionAfterCurrentWord/

It could be positionAfterCurrentWord or positionAfterNextWord. Maybe change 'wordBreak' to 'visiblePosition" and change 'nextWord' to 'positionAfterWordBreak'.

>> Source/WebCore/editing/visible_units.cpp:1163
>> +    VisiblePosition wordPosition = previousWordPosition(nextWord);
> 
> Nit: s/wordPosition/positionBeforeCurrentWord/

maybe s/wordPosition/positionBeforeWordBreak/?

>>>> Source/WebCore/editing/visible_units.cpp:1186
>>>> +        || (!box->isLeftToRightDirection() && box->prevLeafChild()))) {
>>> 
>>> I'm not following here.  Shouldn't we be checking box->isLeftToRightDirection() && !box->prevLeafChild() instead?
>> 
>> The word break collecting here is actually the rightmost word break for a LTR box or leftmost word break for a RTL box.  Given LTR box as an example, we do not need to save its rightmost word break if it is the visually rightmost box (has no nextLeafChild).
> 
> But your condition says box->isLTR() AND box->nextLeafChild().  Under this condition, the box is not the rightmost box.

We need to collect the rightmost word break if the box is *not* the rightmost box. And we do *not* need to collect the rightmost word break if the box is the rightmost box.
Comment 12 Ryosuke Niwa 2011-05-09 16:03:39 PDT
(In reply to comment #11)
> We did not expose whether the word break is before a word or after a word in the function names of ....WithSameDirectionality() and ....WithDifferentDirectionality() and their callee.
> Inside previousWordBreakInBoxInsideBlockWithSameDirectionality() and nextWordBreakInBoxInsideBlockWithDifferentDirectionality(), there is comment saying where the word break should be, those functions and their callees follow the rule.

That's bad :(

> >> Source/WebCore/editing/visible_units.cpp:1162
> >> +    VisiblePosition nextWord = nextWordPosition(wordBreak);
> > 
> > Nit: s/nextWord/positionAfterCurrentWord/
> 
> It could be positionAfterCurrentWord or positionAfterNextWord. Maybe change 'wordBreak' to 'visiblePosition" and change 'nextWord' to 'positionAfterWordBreak'.

positionAfterWord.  WordBreak, by definition, is before or after a word and it's redundant.

> >> Source/WebCore/editing/visible_units.cpp:1163
> >> +    VisiblePosition wordPosition = previousWordPosition(nextWord);
> > 
> > Nit: s/wordPosition/positionBeforeCurrentWord/
> 
> maybe s/wordPosition/positionBeforeWordBreak/?

Ditto; positionBeforeWord

> > But your condition says box->isLTR() AND box->nextLeafChild().  Under this condition, the box is not the rightmost box.
> 
> We need to collect the rightmost word break if the box is *not* the rightmost box. And we do *not* need to collect the rightmost word break if the box is the rightmost box.

Why is that?
Comment 13 Xiaomei Ji 2011-05-09 16:37:35 PDT
(In reply to comment #12)
> (In reply to comment #11)
> > We did not expose whether the word break is before a word or after a word in the function names of ....WithSameDirectionality() and ....WithDifferentDirectionality() and their callee.
> > Inside previousWordBreakInBoxInsideBlockWithSameDirectionality() and nextWordBreakInBoxInsideBlockWithDifferentDirectionality(), there is comment saying where the word break should be, those functions and their callees follow the rule.
> 
> That's bad :(


s/nextWordBreakInBoxInsideBlockWithDifferentDirectionality/positionAfterWordInBoxInsideBlockWithDifferentDirectionality/ ?  (I feel the previous one is better, but I can not say positonAfterNextWord because it could be positonAfterCurrentWord or positionAfterNextWord).

s/visuallyFirstWordBreakInBoxInsideBlockWithDifferentDirectionality/visuallyFirstPositionAfterWordInBoxInsideBlockWithDifferentDirectionality/ ?  (maybe inline the function in).

s/previousWordBreakInBoxInsideBlockWithSameDirectionality/positionBeforeWordInBoxInsideBlockWithSameDirectionality/ ?


inline visuallyFirstWordBreakInBoxInsideBlockWithSameDirectionality ?


> 
> > > But your condition says box->isLTR() AND box->nextLeafChild().  Under this condition, the box is not the rightmost box.
> > 
> > We need to collect the rightmost word break if the box is *not* the rightmost box. And we do *not* need to collect the rightmost word break if the box is the rightmost box.
> 
> Why is that?

we do not need to save the rightmost word break (the visually rightmost position in a line). For example, 
"abc def", when caret is inside 'def', caret stays the same position when press ctrl-right-arrow (following widnow's native support).
Comment 14 Xiaomei Ji 2011-05-10 14:16:22 PDT
Created attachment 93006 [details]
patch w/ layout test

per comment #7, inlined those 2 newly added functions (visuallyFirstWordBreakxxx) and changed the variable names to be "positionAfterWord"/"positionBeforeWord".
Comment 15 Ryosuke Niwa 2011-05-10 15:49:40 PDT
Comment on attachment 93006 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=93006&action=review

> Source/WebCore/editing/visible_units.cpp:1176
> +

It's not too obvious here that we're at the end of the current box.  I think we should combine the outer if on hasSeenWordBreakInThisBox with a tertiary operator above that constructs wordBreak so that the fact wordBreak is at the end of the current box becomes clear.
Comment 16 Xiaomei Ji 2011-05-10 16:42:53 PDT
Created attachment 93048 [details]
patch w/ layout test

updated per comment #15.
Comment 17 Ryosuke Niwa 2011-05-10 18:35:37 PDT
Comment on attachment 93048 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=93048&action=review

> Source/WebCore/editing/visible_units.cpp:1178
> +        // it is not covered by the box previously visited. For example, given a logical text 

Nit: if it is not in the previously visited boxes.

> Source/WebCore/editing/visible_units.cpp:1179
> +        // "abc def     hij opq", there are 2 boxes "abc def " (starts at 0 and length is 8) and 

Nit: You forgot a preposition before "abc...".  In "abc.."?

> Source/WebCore/editing/visible_units.cpp:1182
> +        // "hij opq" (starts at 12 and length is 7). The word breaks are at position 3 (which is 
> +        // right after "abc"), position 8 (which is right after "def "), and position 15 (which is 
> +        // right after "hij"). Word break betwen "def" and "hij" (which is position 8) does not 

This paragraph is really verbose.  Can we just use | notation?  I think people get what you mean.

Also maybe we should state in which order we're traversing these boxes and which one is the current box and which one is the next leaf child.

> Source/WebCore/editing/visible_units.cpp:1297
> +    const InlineBox* box, const VisiblePosition& previousWordBreak, int& offsetOfWordBreak, bool& isLastWordBreakInBox, bool& previousWordBreakIsInBoundary)

Nit: s/previousWordBreakIsInBoundary/previousWordBreakWasAtBoundary/

In addition, it'll be nice to say whether it's at start (left for LTR/right for RTL) or end (right for LTR/left for RTL).
Comment 18 Ryosuke Niwa 2011-05-10 18:35:37 PDT
Comment on attachment 93048 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=93048&action=review

> Source/WebCore/editing/visible_units.cpp:1178
> +        // it is not covered by the box previously visited. For example, given a logical text 

Nit: if it is not in the previously visited boxes.

> Source/WebCore/editing/visible_units.cpp:1179
> +        // "abc def     hij opq", there are 2 boxes "abc def " (starts at 0 and length is 8) and 

Nit: You forgot a preposition before "abc...".  In "abc.."?

> Source/WebCore/editing/visible_units.cpp:1182
> +        // "hij opq" (starts at 12 and length is 7). The word breaks are at position 3 (which is 
> +        // right after "abc"), position 8 (which is right after "def "), and position 15 (which is 
> +        // right after "hij"). Word break betwen "def" and "hij" (which is position 8) does not 

This paragraph is really verbose.  Can we just use | notation?  I think people get what you mean.

Also maybe we should state in which order we're traversing these boxes and which one is the current box and which one is the next leaf child.

> Source/WebCore/editing/visible_units.cpp:1297
> +    const InlineBox* box, const VisiblePosition& previousWordBreak, int& offsetOfWordBreak, bool& isLastWordBreakInBox, bool& previousWordBreakIsInBoundary)

Nit: s/previousWordBreakIsInBoundary/previousWordBreakWasAtBoundary/

In addition, it'll be nice to say whether it's at start (left for LTR/right for RTL) or end (right for LTR/left for RTL).
Comment 19 Xiaomei Ji 2011-05-11 11:36:57 PDT
Created attachment 93152 [details]
patch w/ layout test

updated per comment #18.

>> Source/WebCore/editing/visible_units.cpp:1297
>> +    const InlineBox* box, const VisiblePosition& previousWordBreak, int& offsetOfWordBreak, bool& isLastWordBreakInBox, bool& previousWordBreakIsInBoundary)

>Nit: s/previousWordBreakIsInBoundary/previousWordBreakWasAtBoundary/

>In addition, it'll be nice to say whether it's at start (left for LTR/right for RTL) or end (right for LTR/left for RTL).

changed it to previousWordBreakWasAtBoxStart.
Comment 20 Ryosuke Niwa 2011-05-13 11:09:41 PDT
Comment on attachment 93152 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=93152&action=review

> Source/WebCore/editing/visible_units.cpp:1183
> +        // "abc |def |    hij |opq". We are traversing these boxes from right to left. The word 
> +        // break between "def" and "hij" (which is position 8) is not in box "hij opq", and it 
> +        // should be added as a word break (the rightmost word break) when traversing box "abc def".

It's still not clear which of these two boxes correspond to this "box" (current one).

> Source/WebCore/editing/visible_units.cpp:1277
> +    if (wordBreak == previousWordBreak)
> +        return VisiblePosition();    
> +   

Why do we need this early exit?

> Source/WebCore/editing/visible_units.cpp:1327
> +            offsetOfWordBreak = offset;
> +            isLastWordBreakInBox = false;
> +            previousWordBreakWasAtBoxStart = true;
> +            return positionAfterWord;

This makes me think that we might want to bundle up all these functions as a class.
Comment 21 Xiaomei Ji 2011-05-16 11:20:25 PDT
Created attachment 93675 [details]
patch w/ layout test

(In reply to comment #20)
> (From update of attachment 93152 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=93152&action=review
> 
> > Source/WebCore/editing/visible_units.cpp:1183
> > +        // "abc |def |    hij |opq". We are traversing these boxes from right to left. The word 
> > +        // break between "def" and "hij" (which is position 8) is not in box "hij opq", and it 
> > +        // should be added as a word break (the rightmost word break) when traversing box "abc def".
> 
> It's still not clear which of these two boxes correspond to this "box" (current one).

Changed to:
        // "abc |def |    hij |opq". We are traversing these boxes from right to left. When
        // traversing box "abc def", the word break between "def" and "hij" (which is position 8)
        // is added as the rightmost word break.


> 
> > Source/WebCore/editing/visible_units.cpp:1277
> > +    if (wordBreak == previousWordBreak)
> > +        return VisiblePosition();    
> > +   
> 
> Why do we need this early exit?

You are right. They are actually not needed.

> 
> > Source/WebCore/editing/visible_units.cpp:1327
> > +            offsetOfWordBreak = offset;
> > +            isLastWordBreakInBox = false;
> > +            previousWordBreakWasAtBoxStart = true;
> > +            return positionAfterWord;
> 
> This makes me think that we might want to bundle up all these functions as a class.

that could be a good idea. Filed https://bugs.webkit.org/show_bug.cgi?id=60910 for record.
Comment 22 Ryosuke Niwa 2011-05-16 15:59:28 PDT
Comment on attachment 93675 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=93675&action=review

> Source/WebCore/editing/visible_units.cpp:1182
> +        // and the "hij opq" (starts at 12 and length is 7). The word breaks are 
> +        // "abc |def |    hij |opq". We are traversing these boxes from right to left. When
> +        // traversing box "abc def", the word break between "def" and "hij" (which is position 8)

You need to clarify why we need to special-case this and which one of "abc def" or "hij opq" is current box.

How about something like this?

We normally catch the line break between "def" and "hij" when we visit the box that contains "hij opq"
but the line break doesn't exist in the box that contains "hij opq" when there are multiple spaces.
So we detect it when we're traversing the box that contains "abc def " instead.

> Source/WebCore/editing/visible_units.cpp:1190
> +            positionBeforeWord.getInlineBoxAndOffset(boxContainingPreviousWordBreak, offsetOfWordBreak);

To address the issue below, you should declare boxContainingPreviousWordBreak here.

> Source/WebCore/editing/visible_units.cpp:1201
> -    InlineBox* boxContainingPreviousWordBreak;
>      wordBreak.getInlineBoxAndOffset(boxContainingPreviousWordBreak, offsetOfWordBreak);

I don't think we should be reusing variable like this.  Please declare boxContainingPreviousWordBreak here.

> Source/WebCore/editing/visible_units.cpp:1321
> +        InlineBox* boxContainingWordBreak;
> +        int offset;
> +        positionAfterWord.getInlineBoxAndOffset(boxContainingWordBreak, offset);
> +    
> +        if (boxContainingWordBreak == box) {

Can we extract this as a function?  e.g. positionIsInBox(const VisiblePosition&, InlineBox*)?

> Source/WebCore/editing/visible_units.cpp:1330
> +    previousWordBreakWasAtBoxStart = false;
> +
> +    VisiblePosition wordBreak = (hasSeenWordBreakInThisBox && !previousWordBreakWasAtBoxStart) ? previousWordBreak : Position(box->renderer()->node(), box->caretMinOffset(), Position::PositionIsOffsetInAnchor);

Huh? Isn't previousWordBreakWasAtBoxStart always false here?  r- because of this.
Comment 23 Xiaomei Ji 2011-05-16 19:10:03 PDT
> > Source/WebCore/editing/visible_units.cpp:1330
> > +    previousWordBreakWasAtBoxStart = false;
> > +
> > +    VisiblePosition wordBreak = (hasSeenWordBreakInThisBox && !previousWordBreakWasAtBoxStart) ? previousWordBreak : Position(box->renderer()->node(), box->caretMinOffset(), Position::PositionIsOffsetInAnchor);
> 
> Huh? Isn't previousWordBreakWasAtBoxStart always false here?  r- because of this.

Hm.. you are correct. But I have not been able to create a test case for it.
Comment 24 Xiaomei Ji 2011-05-17 16:02:26 PDT
Created attachment 93839 [details]
patch w/ layout test

> > Source/WebCore/editing/visible_units.cpp:1330
> > +    previousWordBreakWasAtBoxStart = false;
> > +
> > +    VisiblePosition wordBreak = (hasSeenWordBreakInThisBox && !previousWordBreakWasAtBoxStart) ? previousWordBreak : Position(box->renderer()->node(), box->caretMinOffset(), Position::PositionIsOffsetInAnchor);
> 
> Huh? Isn't previousWordBreakWasAtBoxStart always false here?  r- because of this.

Thanks for catching it!

I did not find test cases that the logic (for getting the rightmost word break in RTL box or leftmost word break in LTR box) added in nextWordBreakInBoxInsideBlockWithDifferentDirectionality() helps. And I removed the logic completely.

It was added to solve the word break separated by multiple spaces originally. But the following cases show that they are not needed (using RTL box in LTR block as example).

Case 1: in LTR block, a RTL box followed by another RTL box and are separated by multiple spaces.
logical text: ABC DEF    HIJ OPQ
visual display: QPO JIH FED CBA
there are 2 boxes: box 1 "ABC DEF", starts at 0, length is 8; box2 "HIJ OPQ", starts at 11, length is 7.
The word break between "DEF" and "HIJ" is at position 7, which belongs to box "HIJ OPQ".

Case 2: in LTR box, a RTL box followed by a LTR box and are separated by multiple spaces.
logical text "ABC DEF HIJ    abc def hij"
visual order "JIH FED CBA acb def hij".
There are 3 boxes: box 1 "ABC DEF HIJ", starts at 0, length is 11; box 2: " " (space), starts at 11, length is 1; box 3: "abc def hij", starts at 15, length is 11.
The word break between "HIJ" and "abc" falls in box 2 and is caught by the logic added in previousWordBreakInBoxInsideBlockWithSameDirectionality().
Comment 25 Ryosuke Niwa 2011-05-17 22:24:28 PDT
Comment on attachment 93839 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=93839&action=review

> Source/WebCore/editing/visible_units.cpp:-1341
> +    bool isLastWordBreakInBox = false;
>      while (1) {
> -        bool isLastWordBreakInBox = false;

Is this change necessary after you've removed the change in nextWordBreakInBoxInsideBlockWithDifferentDirectionality?
Comment 26 Xiaomei Ji 2011-05-18 10:26:19 PDT
(In reply to comment #25)
> (From update of attachment 93839 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=93839&action=review
> 
> > Source/WebCore/editing/visible_units.cpp:-1341
> > +    bool isLastWordBreakInBox = false;
> >      while (1) {
> > -        bool isLastWordBreakInBox = false;
> 
> Is this change necessary after you've removed the change in nextWordBreakInBoxInsideBlockWithDifferentDirectionality?

It is not necessary for functionality correctness (isLastWordBreakInBox is always assigned in the callee), but it is good performance wise, so, I kept it there.
Comment 27 Ryosuke Niwa 2011-05-19 15:03:27 PDT
Comment on attachment 93839 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=93839&action=review

>>> Source/WebCore/editing/visible_units.cpp:-1341
>>> -        bool isLastWordBreakInBox = false;
>> 
>> Is this change necessary after you've removed the change in nextWordBreakInBoxInsideBlockWithDifferentDirectionality?
> 
> It is not necessary for functionality correctness (isLastWordBreakInBox is always assigned in the callee), but it is good performance wise, so, I kept it there.

I don't think this will have any significant performance improvement.  In practice, smaller scope/lifetime will help compiler optimization.
Comment 28 Xiaomei Ji 2011-05-20 10:45:34 PDT
Committed r86966: <http://trac.webkit.org/changeset/86966>