Replacing smart quotes, em-dashes, and ellipses with MySQL or PHP

March 5, 2009
Alternate title: "Help! My Quotes Appear as Question Marks or Other Strange Characters!" The "Smart quotes" feature in Microsoft Office transforms straight quotes into curly quotes. It also transforms hyphens into em-dashes and three periods into ellipses. While one might think, "How lovely! My document looks almost as if I'm educated!" readers of said document may not. Microsoft, in its infinite wisdom, decided to assign special characters such as the ones we just mentioned to a range of codes above 128. Problem: these codes aren't compatible with other character sets such as ISO-8859-1 or UTF-8, resulting in frustrating issues with non-Microsoft systems. Keep reading for some PHP and MySQL code to help out with this issue. Our introduction to this was in a situation where we had people using many different systems submitting articles to one of our programs. We decided that we wanted all our articles to use straight quotes, hyphens, and periods. This was partly for consistency, and partly because these characters are common to many character sets and won't cause incompatibilities. Should your requirements be different, it should be trivial to modify the code below to fit your specific situation. Here are some MySQL and PHP techniques for replacing all instances of smart quotes, plus the en dash, em dash, and ellipsis with straight quotes, one or two dashes, or three dots. This code should operate with both the Windows-1252 charset, and also UTF-8, an extended character set that is in many situations the "best" character set to use for email and websites. MySQL:
# First, replace UTF-8 characters.
UPDATE `t` SET `c` = REPLACE(`c`, 0xE28098, "'");
UPDATE `t` SET `c` = REPLACE(`c`, 0xE28099, "'");
UPDATE `t` SET `c` = REPLACE(`c`, 0xE2809C, '"');
UPDATE `t` SET `c` = REPLACE(`c`, 0xE2809D, '"');
UPDATE `t` SET `c` = REPLACE(`c`, 0xE28093, '-');
UPDATE `t` SET `c` = REPLACE(`c`, 0xE28094, '--');
UPDATE `t` SET `c` = REPLACE(`c`, 0xE280A6, '...');
# Next, replace their Windows-1252 equivalents.
UPDATE `t` SET `c` = REPLACE(`c`, char(145), "'");
UPDATE `t` SET `c` = REPLACE(`c`, char(146), "'");
UPDATE `t` SET `c` = REPLACE(`c`, char(147), '"');
UPDATE `t` SET `c` = REPLACE(`c`, char(148), '"');
UPDATE `t` SET `c` = REPLACE(`c`, char(150), '-');
UPDATE `t` SET `c` = REPLACE(`c`, char(151), '--');
UPDATE `t` SET `c` = REPLACE(`c`, char(133), '...');
PHP:
// First, replace UTF-8 characters.
$text = str_replace(
 array("\xe2\x80\x98", "\xe2\x80\x99", "\xe2\x80\x9c", "\xe2\x80\x9d", "\xe2\x80\x93", "\xe2\x80\x94", "\xe2\x80\xa6"),
 array("'", "'", '"', '"', '-', '--', '...'),
 $text);
// Next, replace their Windows-1252 equivalents.
 $text = str_replace(
 array(chr(145), chr(146), chr(147), chr(148), chr(150), chr(151), chr(133)),
 array("'", "'", '"', '"', '-', '--', '...'),
 $text);
Additionally, here's a table of character codes that you may find useful: Windows-1252 characters not present in ISO-8859-1 Further reading from Wikipedia: UTF-8 ISO/IEC 8859-1 Windows-1252
Chuck
July 17, 2009
I spent about 2 hours trying to figure this out. You're awesome Mango!
Nirav Shah
July 17, 2009
YOU ROCK Mango!
Ananth
September 28, 2009
Thanks a LOT dude!!!! I was about to give up on trying to replace ellipses with triple dots till I came across your post. You be a life saver :)

- Ananth
Don Jones
November 29, 2009
very useful - and easily extended to include \x99 and other goofiness. Many thanks.
Jeremy
December 1, 2009
You are a god among men. I've spent all day trying to fix these dumb smart quotes and nowhere else on the web could I find your bit where you replace the following characters. This solved it. I owe you everything.

"\xe2\x80\x98", "\xe2\x80\x99", "\xe2\x80\x9c", "\xe2\x80\x9d", "\xe2\x80\x93", "\xe2\x80\x94", "\xe2\x80\xa6"
Mango
December 6, 2009
You guys be careful - you're boosting my already-inflated ego! :)

I am glad to hear this is helping.
AIPF
February 23, 2010
Dude, thanks. I was about to tear out one of my three remaining hairs. This was simple, rocks and lets me get on with life.
Mango
February 23, 2010
If Microsoft was preventing you from getting on with your life I am very glad to have been able to help with that!
AIPF
February 23, 2010
Welll... I was referring to my programming life... :)
theJoker
August 18, 2010
Thanks for this, I spent a long, long time trying to convert those stupid windows curly quotes. I really appreciate it.
cykan
September 15, 2010
hey thx man. i had some probs too, especially because the ellipsis. ^^ Thx!
Daniel
September 22, 2010
By the way, this conversion operates faster on large tables with a WHERE clause (PostgreSQL):

UPDATE note
SET note_text = REPLACE(note_text, E'\…', E'\.\.\.')
WHERE note_text LIKE E'%\…%';
Jesse
March 27, 2011
u r a CHAMP! Thank you sir!
Dan Metzner
March 29, 2011
Thank you! I have to accommodate numerous people entering product information from various irritating sources including MS Word. This is such a simple lifesaver!
Kosta Kontos
June 3, 2011
This is awesome! Works like a charm.
Thanks so much :)
Cheers from South Africa
Eirik O.
August 11, 2011
Thank you for this info!

One note..
I got an error in parsing the Ellipsis when using chr(133) in php.
(It changed all occurrences of norwegian "Å")
Php ord() gave 226 as the ellipsis and that worked well.

Best Regards
Eirik
kbc
October 25, 2011
Great solution, thanks for sharing
gige
November 11, 2011
Thank you mate, great smart/curly quotes, em-dashes and ellipses reference!
Brandon
December 1, 2011
Dude, you're AWESOME. This totally worked, couldn't find this info anywhere else.
Mayur
December 7, 2011
We spend about 2 hrs figuring the problem. This worked like a charm.. Very good solution
Danny
December 15, 2011
Thank you! Customer has been driving me mad with this.
ClayHmn
January 29, 2012
Thank you!, quickly fixed a knotty problem.
Bhagirath Thaker
February 14, 2012
Simply awesome. Agree with many comments made. Spend the whole day.... You certainly made my day!!!! Thank you.
Anna
March 25, 2012
You are seriously my hero. For my site I have different people putting descriptions in from all of these different sources and word documents. It could be a cluster eff of question-mark-filled-diamonds, but now it's filled with marvelous straight quotes and hyphens! THANK YOU. SERIOUSLY YOU HAVE SAVED ME MANY HOURS.
Brinard
April 15, 2012
Thank you!!!
Heloisa
April 23, 2012
All my bosses were driving me crazy because of these stupid quotes, since e-mail clients do not support them and they were ruining our newsletters. Thank you so very much -)
Deftbird
September 26, 2012
Awesome, Thanks!
Patrick Larson
October 17, 2012
I really appreciate this fix. Same thing as Heloisa, newsletters built with microsoft word were not reading properly on some browsers and email clients. You saved the world a bunch of irritation.
Rich
December 4, 2012
I usually just find code examples that fix whatever my issue is and go on, but in this case I spent the better part of a day and tried 50 different things (NONE worked) until I found this....You deserve ALL the credit. Thanks you for posting.
Edward
January 23, 2013
This has solved my issue, thank you! I tried so many alternatives and nothing else worked, then I discovered this!
Vestigal
June 5, 2013
This was very helpful; thank you!
Charles
June 20, 2013
Life Saver!
Heng Chamroeun
July 24, 2013
Great! Thank you very much!
Ron
December 12, 2013
Terrific solution, especially with curly close double quotes, as that uses a non printing ASCII 159(dec)/9D(hex) code, making it difficult to find the correct character to replace.
Jan
March 20, 2014
Thanks for your work, it saves me a lot of time.

I'd like to note that the second part related to Windows-1252 conversion causes problem in my case. My input string is UTF8. When testing with polish 'ą' it cuts off it and all following characters?!

Sample:
Input: 'abcdąefgh'
Output: 'abcd'

I think that for UTF8 input it is better not to convert Windows-1252 characters. For many people it might be obvious, but I'm posting it here in case somebody was struggling with it ;)
CC
June 30, 2014
Thank you very much, you saved my life :)
My PHP application saves teachers reports and then exports to a word merge and those smart quotes were crashing my program.
Thank you again.
Kenton
August 2, 2014
Thank you! Your posting remains timely even to this day. It helped me greatly to troubleshoot an ingestion issue into HBase from Ruby due to 'bad' characters.
Mango
August 2, 2014
Sweet. :)
haluk karamete
November 13, 2014
Another DUDE, YOU ARE AWESOME from me too!
Alex
April 19, 2015
This is 2015, and you are still god .. thanks man!
Jella
March 22, 2016
Great mate. that really helped and saved so much time. I was getting this Right Apostrophe mark from Sql Server, i'm trying to store this in mysql database. your script solved the problem.
geert
June 13, 2016
Genius!
Anonymous
June 14, 2018
Thanks a million. Spent 16 hours figuring out why some of the data from my database didn't show up. Just added the PHP code an it worked. Thanks!
Mango
June 14, 2018
I love you all. :)