You can further develop the CSS, but that isn’t the focus of this article. There is no question that the design can be prettier than what I am providing here!
Capture The Plain Text Input
We’ll set an onkeyup
event handler on the to call a JavaScript function called
convert()
that does what it says: convert the plain text into HTML. The conversion function should accept one parameter, a string, for the user’s plain text input entered into the element:
onkeyup
is a better choice than onkeydown
in this case, as onkeyup
will call the conversion function after the user completes each keystroke, as opposed to before it happens. This way, the output, which is refreshed with each keystroke, always includes the latest typed character. If the conversion is triggered with an onkeydown
handler, the output will exclude the most recent character the user typed. This can be frustrating when, for example, the user has finished typing a sentence but cannot yet see the final punctuation mark, say a period (.
), in the output until typing another character first. This creates the impression of a typo, glitch, or lag when there is none.
In JavaScript, the convert()
function has the following responsibilities:
- Encode the input in HTML.
- Process the input line-by-line and wrap each individual line in either a
or
HTML tag, whichever is most appropriate.
- Process the output of the transformations as a single string, wrap URLs in HTML
tags, and replace image file names with
elements.
And from there, we display the output. We can create separate functions for each responsibility. Let’s name them accordingly:
html_encode()
convert_text_to_HTML()
convert_images_and_links_to_HTML()
Each function accepts one parameter, a string, and returns a string.
Encoding The Input Into HTML
Use the html_encode()
function to HTML encode/sanitize the input. HTML encoding refers to the process of escaping or replacing certain characters in a string input to prevent users from inserting their own HTML into the output. At a minimum, we should replace the following characters:
<
with<
>
with>
&
with&
'
with'
"
with"
JavaScript does not provide a built-in way to HTML encode input as other languages do. For example, PHP has htmlspecialchars()
, htmlentities()
, and strip_tags()
functions. That said, it is relatively easy to write our own function that does this, which is what we’ll use the html_encode()
function for that we defined earlier:
function html_encode(input) {
const textArea = document.createElement("textarea");
textArea.innerText = input;
return textArea.innerHTML.split("
").join("\n");
}
HTML encoding of the input is a critical security consideration. It prevents unwanted scripts or other HTML manipulations from getting injected into our work. Granted, front-end input sanitization and validation are both merely deterrents because bad actors can bypass them. But we may as well make them work a little harder.
As long as we are on the topic of securing our work, make sure to HTML-encode the input on the back end, where the user cannot interfere. At the same time, take care not to encode the input more than once. Encoding text that is already HTML-encoded will break the output functionality. The best approach for back-end storage is for the front end to pass the raw, unencoded input to the back end, then ask the back-end to HTML-encode the input before inserting it into a database.
That said, this only accounts for sanitizing and storing the input on the back end. We still have to display the encoded HTML output on the front end. There are at least two approaches to consider:
- Convert the input to HTML after HTML-encoding it and before it is inserted into a database.
This is efficient, as the input only needs to be converted once. However, this is also an inflexible approach, as updating the HTML becomes difficult if the output requirements happen to change in the future. - Store only the HTML-encoded input text in the database and dynamically convert it to HTML before displaying the output for each content request.
This is less efficient, as the conversion will occur on each request. However, it is also more flexible since it’s possible to update how the input text is converted to HTML if requirements change.
Let’s use the convert_text_to_HTML()
function we defined earlier to wrap each line in their respective HTML tags, which are going to be either or
. To determine which tag to use, we will
split
the text input on the newline character (\n
) so that the text is processed as an array of lines rather than a single string, allowing us to evaluate them individually.
function convert_text_to_HTML(txt) {
// Output variable
let out="";
// Split text at the newline character into an array
const txt_array = txt.split("\n");
// Get the number of lines in the array
const txt_array_length = txt_array.length;
// Variable to keep track of the (non-blank) line number
let non_blank_line_count = 0;
for (let i = 0; i < txt_array_length; i++) {
// Get the current line
const line = txt_array[i];
// Continue if a line contains no text characters
if (line === ''){
continue;
}
non_blank_line_count++;
// If a line is the first line that contains text
if (non_blank_line_count === 1){
// ...wrap the line of text in a Heading 1 tag
out += ``;
// ...otherwise, wrap the line of text in a Paragraph tag.
} else {
out += `${line}
`;
}
}
return out;
}
In short, this little snippet loops through the array of split text lines and ignores lines that do not contain any text characters. From there, we can evaluate whether a line is the first one in the series. If it is, we slap a tag on it; otherwise, we mark it up in a
tag.
This logic could be used to account for other types of elements that you may want to include in the output. For example, perhaps the second line is assumed to be a byline that names the author and links up to an archive of all author posts.
Tagging URLs And Images With Regular Expressions
Next, we’re going to create our convert_images_and_links_to_HTML()
function to encode URLs and images as HTML elements. It’s a good chunk of code, so I’ll drop it in and we’ll immediately start picking it apart together to explain how it all works.
function convert_images_and_links_to_HTML(string){
let urls_unique = [];
let images_unique = [];
const urls = string.match(/https*:\/\/[^\s<),]+[^\s<),.]/gmi) ?? [];
const imgs = string.match(/[^"'>\s]+\.(jpg|jpeg|gif|png|webp)/gmi) ?? [];
const urls_length = urls.length;
const images_length = imgs.length;
for (let i = 0; i < urls_length; i++){
const url = urls[i];
if (!urls_unique.includes(url)){
urls_unique.push(url);
}
}
for (let i = 0; i < images_length; i++){
const img = imgs[i];
if (!images_unique.includes(img)){
images_unique.push(img);
}
}
const urls_unique_length = urls_unique.length;
const images_unique_length = images_unique.length;
for (let i = 0; i < urls_unique_length; i++){
const url = urls_unique[i];
if (images_unique_length === 0 || !images_unique.includes(url)){
const a_tag = `${url}`;
string = string.replace(url, a_tag);
}
}
for (let i = 0; i < images_unique_length; i++){
const img = images_unique[i];
const img_tag = ``;
const img_link = `${img_tag}`;
string = string.replace(img, img_link);
}
return string;
}
Unlike the convert_text_to_HTML()
function, here we use regular expressions to identify the terms that need to be wrapped and/or replaced with or
tags. We do this for a couple of reasons:
- The previous
convert_text_to_HTML()
function handles text that would be transformed to the HTML block-level elementsand
, and, if you want, other block-level elements such as
. Block-level elements in the HTML output correspond to discrete lines of text in the input, which you can think of as paragraphs, the text entered between presses of the Enter key.
- On the other hand, URLs in the text input are often included in the middle of a sentence rather than on a separate line. Images that occur in the input text are often included on a separate line, but not always. While you could identify text that represents URLs and images by processing the input line-by-line — or even word-by-word, if necessary — it is easier to use regular expressions and process the entire input as a single string rather than by individual lines.
Regular expressions, though they are powerful and the appropriate tool to use for this job, come with a performance cost, which is another reason to use each expression only once for the entire text input.
Remember: All the JavaScript in this example runs each time the user types a character, so it is important to keep things as lightweight and efficient as possible.
I also want to make a note about the variable names in our convert_images_and_links_to_HTML()
function. images
(plural), image
(singular), and link
are reserved words in JavaScript. Consequently, imgs
, img
, and a_tag
were used for naming. Interestingly, these specific reserved words are not listed on the relevant MDN page, but they are on W3Schools.
We’re using the String.prototype.match()
function for each of the two regular expressions, then storing the results for each call in an array. From there, we use the nullish coalescing operator (??
) on each call so that, if no matches are found, the result will be an empty array. If we do not do this and no matches are found, the result of each match()
call will be null
and will cause problems downstream.
const urls = string.match(/https*:\/\/[^\s<),]+[^\s<),.]/gmi) ?? [];
const imgs = string.match(/[^"'>\s]+\.(jpg|jpeg|gif|png|webp)/gmi) ?? [];
Next up, we filter the arrays of results so that each array contains only unique results. This is a critical step. If we don’t filter out duplicate results and the input text contains multiple instances of the same URL or image file name, then we break the HTML tags in the output. JavaScript does not provide a simple, built-in method to get unique items in an array that’s akin to the PHP array_unique()
function.
The code snippet works around this limitation using an admittedly ugly but straightforward procedural approach. The same problem is solved using a more functional approach if you prefer. There are many articles on the web describing various ways to filter a JavaScript array in order to keep only the unique items.
We’re also checking if the URL is matched as an image before replacing a URL with an appropriate tag and performing the replacement only if the URL doesn’t match an image. We may be able to avoid having to perform this check by using a more intricate regular expression. The example code deliberately uses regular expressions that are perhaps less precise but hopefully easier to understand in an effort to keep things as simple as possible.
And, finally, we’re replacing image file names in the input text with tags that have the
src
attribute set to the image file name. For example, my_image.png
in the input is transformed into in the output. We wrap each
tag with an
tag that links to the image file and opens it in a new tab when clicked.
There are a couple of benefits to this approach:
- In a real-world scenario, you will likely use a CSS rule to constrain the size of the rendered image. By making the images clickable, you provide users with a convenient way to view the full-size image.
- If the image is not a local file but is instead a URL to an image from a third party, this is a way to implicitly provide attribution. Ideally, you should not rely solely on this method but, instead, provide explicit attribution underneath the image in a
,, or similar element. But if, for whatever reason, you are unable to provide explicit attribution, you are at least providing a link to the image source.
It may go without saying, but “hotlinking” images is something to avoid. Use only locally hosted images wherever possible, and provide attribution if you do not hold the copyright for them.
Before we move on to displaying the converted output, let’s talk a bit about accessibility, specifically the image alt
attribute. The example code I provided does add an alt
attribute in the conversion but does not populate it with a value, as there is no easy way to automatically calculate what that value should be. An empty alt
attribute can be acceptable if the image is considered “decorative,” i.e., purely supplementary to the surrounding text. But one may argue that there is no such thing as a purely decorative image.
That said, I consider this to be a limitation of what we’re building.
Displaying the Output HTML
We’re at the point where we can finally work on displaying the HTML-encoded output! We’ve already handled all the work of converting the text, so all we really need to do now is call it:
function convert(input_string) {
output.innerHTML = convert_images_and_links_to_HTML(convert_text_to_HTML(html_encode(input_string)));
}
If you would rather display the output string as raw HTML markup, use a
:
The only thing to note about this approach is that you would target the
element’s textContent
instead of innerHTML
:function convert(input_string) {
output.textContent = convert_images_and_links_to_HTML(convert_text_to_HTML(html_encode(input_string)));
}
Conclusion
We did it! We built one of the same sort of copy-paste tool that converts plain text on the spot. In this case, we’ve configured it so that plain text entered into a
is parsed line-by-line and encoded into HTML that we format and display inside another element.
We were even able to keep the solution fairly simple, i.e., vanilla HTML, CSS, and JavaScript, without reaching for a third-party library or framework. Does this simple solution do everything a ready-made tool like a framework can do? Absolutely not. But a solution as simple as this is often all you need: nothing more and nothing less.
As far as scaling this further, the code could be modified to POST
what’s entered into the
using a PHP script or the like. That would be a great exercise, and if you do it, please share your work with me in the comments because I’d love to check it out.
References
(gg, yk)