Server Bug Fix: Extract words with clue numbers in a Crossword puzzle

Original Source Link

Related: Read a crossword

Task

Given a completed Crossword puzzle, extract the words in it with their respective clue numbers, with “across” (horizontal) and “down” (vertical) words grouped and ordered like a real crossword puzzle.

The words are numbered in the row-major order of their first letters. If an “across” word and a “down” word share the first letter, the two words share one number. (As in regular Crossword, single-letter words don’t count; they should not be numbered or included in the output.)

The input is given as a character matrix (or equivalent), with letters being uppercase (or lowercase if you want) and non-letter cells being blank. You can assume that single-letter islands won’t appear in the input. If you want, you can assume the grid is rectangular (shorter rows are padded with spaces).

The output must be clearly grouped into “across” and “down” (you don’t need to return or print these strings though), and the words must be ordered in the increasing order of the clue number (see test cases).

Standard rules apply. The shortest code in bytes wins.

Test cases

Input:
MESS 
YAWN 
 SAID
 TYPO
Output:
Across: 1. MESS 5. YAWN 6. SAID 8. TYPO
Down: 1. MY 2. EAST 3. SWAY 4. SNIP 7. DO

Input:
   RECURS
  PARAPET
 TRIANGLE
COOLS RAW
HUTS MATE
ORE CODED
INCLUDES
RETIRES
SYSTEM
Output:
Across: 1. RECURS 7. PARAPET 8. TRIANGLE 9. COOLS 10. RAW 11. HUTS
        12. MATE 13. ORE 14. CODED 15. INCLUDES 17. RETIRES 18. SYSTEM
Down: 1. RAILS 2. ERAS 3. CAN 4. UPGRADES 5. RELATES 6. STEWED
      7. PROTECTS 8. TOURNEY 9. CHOIRS 12. MODEM 14. CURE 16. LIT

Input:
MICROWAVE
U       S
M OCEAN C
M  OWL  A
I SWELL P
E       E
SCHEDULES
Output:
Across: 1. MICROWAVE 3. OCEAN 7. OWL 8. SWELL 9. SCHEDULES
Down: 1. MUMMIES 2. ESCAPES 4. COW 5. EWE 6. ALL

Input:
TAB
A U
BUBBLE
  B O
  LOVED
  E E
Output:
Across: 1. TAB 3. BUBBLE 5. LOVED
Down: 1. TAB 2. BUBBLE 4. LOVE

Charcoal, 118 bytes

WS⊞υιP⪫υ¶≔⟦⟧θFυ«Fι«F‹ κ«≔✂ιⅈLι¹η≔⪫KD⁻Lυⅉ↓ωζ≡÷⌕2374ce6⍘↨EKV‹ μ²φ³¦²⊞θ⟦ηζ⟧¹⊞θ⟦ηω⟧⁰⊞θ⟦ωζ⟧»→»⸿»⎚E²⪫EΦEθ⟦⊕μ§⪪§λι ⁰⟧§λ¹⪫λ ¦ 

Try it online! Link is to verbose version of code. Takes input as a rectangular string of newline-terminated lines. Explanation:

WS⊞υιP⪫υ¶

Input the rectangle and print it to the canvas.

≔⟦⟧θ

Start off with no clues.

Fυ«Fι«

Loop over the rows and columns.

F‹ κ«

If the current character is not a space, then…

≔✂ιⅈLι¹η

Get the substring of the current line starting at this character.

≔⪫KD⁻Lυⅉ↓ωζ

Get the substring of the current column starting at this character by reading it from the canvas.

≡÷⌕2374ce6⍘↨EKV‹ μ²φ³

Check which orthogonally adjacent characters are spaces.

²⊞θ⟦ηζ⟧

If only the characters to the left and above are spaces, then a word starts both across and down.

¹⊞θ⟦ηω⟧

Otherwise if the character to the left is a space but the character to the right is not, then a word starts across but not down.

⁰⊞θ⟦ωζ⟧

Otherwise if the character above is a space but the character below is not, then a word starts down but not across.

»→»⸿»

Move on to consider the next character on each iteration.

Clear the canvas.

E²⪫EΦEθ⟦⊕μ§⪪§λι ⁰⟧§λ¹⪫λ ¦ 

Map over the across and down clues, numbering them all, but keeping only those clues with words in the respective direction.

(Out of interest I tried using just string manipulation instead of canvas operations but that cost 29 bytes.)

perl -M5.010, 348 bytes

@a=map{[s/ /0/gr=~/./g,0]}<>;[email protected],[(0)[email protected]{$a[0]}];for$y(0..$#a-1){for$x(0..$#{$a[0]}-1){if($a[$y][$x]){$h=$a[$y][$x+1]&&!$a[$y][$x-1];$v=$a[$y+1][$x]&&!$a[$y-1][$x];$c++if$h||$v;if($h){print" $c. ";$i=$x;print$a[$y][$i-1]while$a[$y][$i++]}[email protected],[$c,$y,$x]if$v}}}say;for(@v){($c,$y,$x)[email protected]$_;print" $c. ";$i=$y;print$a[$i-1][$x]while$a[$i++][$x]}say

Try it online!

Requires the input to be a proper rectangle, that is, shorter lines should be padded with spaces.

Python 2, 284 283 281 272 267 263 bytes

-2 bytes thanks to @Noodle9 for suggesting a different output format

x=input()
A=[];D=[]
d=1;b='!'
i=j=0
L=len
while i<L(x):r=x[i];h=(r[j:].split()or b)[0];c=L(h)*(j<1or-~-q)>1;g=(''.join(r[j]for r in x[i:]).split()or b)[0];q=r[j]>b;a=L(g)*q*(i<1or x[i-1][j]<b)>1;D+=a*[d,g];A+=c*q*[d,h];d+=a+c*q>0;j=(j<L(r)-1)*-~j;i+=j<1
print A,D

Try it online!

Takes input as a list of strings representing the rows.

The basic approach involves iterating over every square of the grid in left-to-right and top-to-bottom order. The counter d is incremented when a word can be made either across or down starting from the current square.

Some other notes

  • A and D keep track of the lists of “across” words and “down” words respectively

  • To determine whether a square contains a space or a letter, the character gets compared to the ! character (smaller means a space, larger means it’s a letter)

  • When a word gets added to one of the lists, .split()[0] is called on the remaining portion of the row/column to get the entire word, which ends at the next space character

Jelly, 49 bytes

Ȧ€Œg⁸ṁȦƇḊƇZ€ḢḢ,Ɗ€)ẎṢ
n⁶a⁸ŒĖṁ⁸Ç,ZÇ$ƊµẎZḢQṢiⱮⱮⱮȯ"""

A monadic Link which accepts a (rectangular) list of lists of characters which yields a list of lists of pairs of numbers and lists of characters
…i.e. [across, down] where each of across and down are lists of [clueNumber, answerWord].

Try it online! (The footer splits at newlines, calls the Link and formats the result like the question examples.)

How?

We find all the 2+ letter horizontally written words along with the (2d) index of their first letter in the input matrix for each of the input matrix and its transpose, then sort these (by those indices), and look up each of these starting-indices in a list of the unique starting-indices for the combined results to find the clue-numbers.

Ȧ€Œg⁸ṁȦƇḊƇZ€ḢḢ,Ɗ€)ẎṢ - helper Link: list of lists of pairs of 2-d indices and their
                                    characters (or 0s where spaces were)
                 )   - for each row:
Ȧ€                   -   for each pair: any and all? (0 if it contains a 0 (i.e. character was a space))
  Œg                 -   group runs of equal elements
    ⁸ṁ               -   mould like the input
       Ƈ             -   filter keep those for which:
      Ȧ              -     any and all? - i.e. throw away "words" of 0s
         Ƈ           -   filter keep those for which:
        Ḋ            -     dequeue - i.e. throw away words of length 1
          Z€         -   transpose each
                €    -   for each:
               Ɗ     -     last three links as a monad - f(X=that):
            Ḣ        -       head (modifies X too)
             Ḣ       -       head (modifies X too)
              ,      -       pair with (the modified X)
                  Ẏ  - tighten
                   Ṣ - sort

n⁶a⁸ŒĖṁ⁸Ç,ZÇ$ƊµẎZḢQṢiⱮⱮⱮȯ""" - Link: list of lists of characters (C)
 ⁶                           - space character
n                            - (C) not equal (space)? (vectorises)
  a⁸                         - logical AND (C) (vectorises) -> C but with 0s not spaces
    ŒĖ                       - multi-dimensional enumerate -> [[[1,1],'A'],...,[[n,m],'Z']]
      ṁ⁸                     - mould like C
             Ɗ               - last three links as a monad - f(X=that):
        Ç                    -   call the helper Link with X
            $                -   last two links as a monad:
          Z                  -     transpose X
           Ç                 -     call the helper Link with that
         ,                   -   pair these together
              µ              - start a new monadic chain - f(V=that):
               Ẏ             - tighten V
                Z            - transpose
                 Ḣ           - head -> all word-start indices
                  Q          - deduplicate
                   Ṣ         - sort
                     ⱮⱮⱮ     - 3-deep map across v in V with:
                    i        -   first 1-based index of v or 0 if not found
                         """ - zip across, 3-deep with:
                        ȯ    -   logical OR (V) -> replace the 0s with the words from V

JavaScript (ES6), 179 bytes

Takes input as a matrix of characters. Returns a pair of objects.

m=>m.map(F=(r,y)=>r.map((c,x)=>i+=1/c?0:(g=(d,s=c,Y=y)=>!(C=m[Y+=d]&&m[Y][x+=!d])|C<F?s[1]&&!!(o[d][i]=s):g(d,s+C,Y),!y||m[y-1][x]<F?g(1):0)|(!x|r[x-1]<F&&g(0))),i=1,o=[{},{}])&&o

Try it online!

Tagged : /

Leave a Reply

Your email address will not be published. Required fields are marked *