So I was procrastinating on HN and I saw the following post « Elm and Phoenix/Elixir in Production for France TV », something like 3 days ago. You can find the HN comments here.
It’s been some time I read about Elm and Elixir on HN, but I did not have the chance, or took the time to try these two piece of technology.
So here we are, I’ll try to implement the Norvig’s Did you mean in Elixir, and give you my feeling about it in the same time. It’s probably not the best example, however that’s a very small program that you can easily implement in a couple of hours, even if you don’t know the language.
Keep in mind that I have absolutely zero experience with Elixir nor Erlang, that’s why as always, feel free to show me the correct way :)
You can also refer to my old post of 2011 on Did You Mean, implemented in Java, Python 3 and OCaml.
Installation
First cool thing? The docker install right on the installation page! Thank you for making it. I really don’t want to install it on my mac if I don’t have to.
Let’s run that docker container, and directly download the big.txt
file since we’re gonna need it in some minutes. Finally, we run iex.
$ docker run -it --rm elixir bash
root@f0a608c9a46f:~# wget http://norvig.com/big.txt
root@f0a608c9a46f:~# echo "the quick brown fox jumps over the lazy dog" > foo.txt
root@f0a608c9a46f:~# iex
Erlang/OTP 19 [erts-8.1] [source] [64-bit] [smp:2:2] [async-threads:10] [hipe] [kernel-poll:false]
Interactive Elixir (1.3.4)
iex(1)>
Ready to rumble?
Loading the file
We need a function named load
that takes a file name, and will:
- load the content as a string (
File.read!
) - lowercase the whole thing (
String.downcase
) - scan for words (
Regex.scan(~r/\w+/, ...)
) - produce a map where keys are words, and values are the occurrences of this word in the original string.
- return that map
alias String, :as S
def load(f) do
Enum.reduce((Regex.scan(~r/\w+/, S.downcase (File.read! f)) |> List.flatten), %{}, fn word, counter ->
Map.update(counter, word, 1, &(&1 + 1))
end)
end
First, in Elixir, it seems that a function have to be declared in a module, so we’ll wrap all the functions in a defmodule DidYouMean
later on.
I aliased String
as S
since we’re gonna need some functions in the String module, and it helps in the reading process.
Reading the file and lowercasing it is really simple:
iex(1)> String.downcase (File.read! "foo.txt")
"the quick brown fox jumps over the lazy dog\n"
# it can also be written like that:
iex(2)> txt = File.read! "foo.txt" |> String.downcase
"the quick brown fox jumps over the lazy dog\n"
Searching for words is also easy using Regex.scan
:
iex(3)> Regex.scan(~r/\w+/, txt)
[["the"], ["quick"], ["brown"], ["fox"], ["jumps"], ["over"], ["the"], ["lazy"], ["dog"]]
# flatten it please!
iex(4)> Regex.scan(~r/\w+/, txt) |> List.flatten
["the", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"]
There’s one thing bothering me, the |>
operator let’s you feed arguments to the next function.
But you can’t place arguments where you want, for instance for Regex.scan
you can’t feed the string to it
using |>
. It would have been cool to have a specific operator or some wildcard token to place the arguments where needed. Something like "foo" |>2 Regex.scan ~r/../
.
Ok now we have our list of words, and we want count the occurrences and put the whole thing in a Map
.
I first tried this:
counter = Map.new
for word<-words, do: Map.update(counter, word, 1, &(&1 + 1))
counter
Well it’s not working, due to function naming and scoping issues. The for
makes you enter in a new specific scope where counter
is a new variable, so you can’t update the variable which was defined in the upper scope.
What you need to do is relying on Enum.reduce
for such purposes
iex(5)> words = ["the", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"]
iex(6)> Enum.reduce(words, %{}, fn word, counter ->
...(6)> Map.update(counter, word, 1, &(&1 + 1))
...(6)> end)
%{"brown" => 1, "dog" => 1, "fox" => 1, "jumps" => 1, "lazy" => 1, "over" => 1,
"quick" => 1, "the" => 2}
The &(&1 + 1)
thing represents an anonymous function which returns its first argument added to one.
The edits1
function
Here’s the beast:
def edits1(w) do
alphabet = S.codepoints("abcdefghijklmnopqrstuvwxyz")
splits = for i<-0..(S.length w)+1, do: {S.slice(w,0,i), S.slice(w,i,S.length w)}
(for {l,r}<-splits, do:
[l <> S.slice(r,1,S.length r), # deletes
l <> S.slice(r,1,1) <> S.slice(r,0,1) <> S.slice(r,2,S.length r), # transposes
(for c<-alphabet, do: l <> c <> S.slice(r,1,S.length r)), # replaces
(for c<-alphabet, do: l <> c <> r)]) |> List.flatten |> MapSet.new # inserts > to set
end
String concatenation is made with the <>
operator, ranges can be created using ..
so that 0..10
is an
iterable from 0 to 10 (included).
We need all letters of the alphabet, which we can get using String.codepoints("....")
, it makes an iterable of all the letters. At first I tried to be smart and all and create a range with ?a..?z
which works! The ?
before a character returns its codepoint. But I’ve found no easy way to do the opposite (going from 97
to a
). We could win one more line.
There’s nothing really hard in this par of the code, this is just standard String slicing. I would have liked the syntax to be easier, I miss the slicing like in Python, it makes for easier reading of the source code.
The edits2
function
Really simple there’s nothing specific going on
def edits2(w, nwords) do
(for e1<-edits1(w), e2<-edits1(e1), Map.has_key?(nwords, e1) do e2 end) |> MapSet.new
end
The for
iterates on two streams (think imbricated for) and you can filter using a predicate, here Map.has_key?
, in case the word satisfies the predicate it is returned.
The known
function
def known(words, nwords) do
kn = (for w<-words, Map.has_key?(nwords, w) do w end) |> MapSet.new
cond do MapSet.size(kn) == 0 -> nil
true -> kn end
end
The known
function iterates on the words
paremeter to verify that this word is known (in the nwords
set).
If the set is empty we return nil
(so that we can use the ||
operator later on), otherwise we return the set.
The correct
function
def correct(word, nwords) do
matches = known([word], nwords) || known(edits1(word), nwords) || edits2(word, nwords) || [word]
Enum.sort_by(matches, &(Map.get(nwords, &1, 0)), &>=/2) |> List.first
end
In the correct
method we try first if the word is known, in which case there’s no need to correct it.
Otherwise we search in the words created by edits1
, then in edits2
. If nothing is found, we return the word.
Finally we order these matches by the occurrences found while loading the big.txt
file. And we return the first match.
The whole program
defmodule DidYouMean do
alias String, as: S
def load(f) do
Enum.reduce((Regex.scan(~r/\w+/, S.downcase (File.read! f)) |> List.flatten), %{}, fn word, counter ->
Map.update(counter, word, 1, &(&1 + 1))
end)
end
def edits1(w) do
alphabet = S.codepoints("abcdefghijklmnopqrstuvwxyz")
splits = for i<-0..(S.length w)+1, do: {S.slice(w,0,i), S.slice(w,i,S.length w)}
(for {l,r}<-splits, do:
[l <> S.slice(r,1,S.length r), # deletes
l <> S.slice(r,1,1) <> S.slice(r,0,1) <> S.slice(r,2,S.length r), # transposes
(for c<-alphabet, do: l <> c <> S.slice(r,1,S.length r)), # replaces
(for c<-alphabet, do: l <> c <> r)]) |> List.flatten |> MapSet.new # inserts > to set
end
def edits2(w, nwords) do
(for e1<-edits1(w), e2<-edits1(e1), Map.has_key?(nwords, e1) do e2 end) |> MapSet.new
end
def known(words, nwords) do
kn = (for w<-words, Map.has_key?(nwords, w) do w end) |> MapSet.new
cond do MapSet.size(kn) == 0 -> nil
true -> kn end
end
def correct(word, nwords) do
matches = known([word], nwords) || known(edits1(word), nwords) || edits2(word, nwords) || [word]
Enum.sort_by(matches, &(Map.get(nwords, &1, 0)), &>=/2) |> List.first
end
end
Caveat
There’s only one thing I was not able to do, and I don’t know if it’s possible or not.
I wanted to store the result of the load
directly in the module so that we can directly call
DidYouMean.correct
and avoid passing the nwords
variable as a parameter of the correct
, edits2
and known
functions, but when doing so:
defmodule Foo do
def foo, do: 42
@f foo
end
It says:
** (CompileError) iex:23: undefined function foo/0
(elixir) expanding macro: Kernel.@/1
iex:23: Foo (module)
Testing the corrector
iex(17)> nwords = DidYouMean.load "big.txt"
iex(18)> DidYouMean.correct "speling", nwords
"spelling"
iex(19)> DidYouMean.correct "sujar", nwords
"sugar"
iex(20)> DidYouMean.correct "roopm", nwords
"room"
Feelings
Elixir seems cool, I’ve read much more than needed to write such a simple program, and it is definitely not the type of example where Elixir shines since it involves no concurrency nor supervising and advanced stuff.
That’s why I have a feeling of not gaining anything in using it for this problem, because I think there’s not really something to gain. However, the language feels concise and very pragmatic.
So this solves the challenge in 28 lines. Not bad, but I’m sure an experienced Elixir developer will show me how to shorten that :)
Until next time!