Как проверить адрес электронной почты с помощью регулярного выражения?

На протяжении многих лет я медленно разрабатывал регулярное выражение, которое правильно проверяет адреса электронной почты MOST, предполагая, что они не используют IP-адрес в качестве серверной части.

Я использую его в нескольких PHP-программах, и он работает большую часть времени. Тем не менее, время от времени я связываюсь с кем-то, у кого возникают проблемы с сайтом, который его использует, и мне приходится делать некоторые корректировки (совсем недавно я понял, что я не разрешаю 4-символьные TLD).

Какое лучшее регулярное выражение у вас есть или вы видели для проверки электронной почты?

Я видел несколько решений, которые используют функции, которые используют несколько более коротких выражений, но я предпочел бы иметь одно длинное сложное выражение в простой функции вместо нескольких коротких выражений в более сложной функции.

    Полностью RFE 822-совместимое регулярное выражение неэффективно и неясно из-за его длины. К счастью, RFC 822 был заменен дважды, а текущая спецификация адресов электронной почты – RFC 5322 . RFC 5322 приводит к регулярному выражению, которое можно понять, если его изучить в течение нескольких минут и достаточно эффективно для фактического использования.

    Одно регулярное выражение, поддерживающее RFC 5322, можно найти в верхней части страницы по адресу http://emailregex.com/, но использует шаблон IP-адреса, который плавает по интернету, с ошибкой, которая позволяет 00 для любого из десятичных знаков без знака в точечный адрес, который является незаконным. Остальная часть, по-видимому, согласуется с грамматикой RFC 5322 и проходит несколько тестов с использованием grep -Po , включая имена доменов, IP-адреса, плохие и имена учетных записей с кавычками и без них.

    Исправляя ошибку 00 в шаблоне IP, мы получаем рабочее и довольно быстрое регулярное выражение. (Скопируйте визуализированную версию, а не уценку, для фактического кода.)

    (: [А-z0-9 # $% & ‘* + / = ^ _ `{|} ~ – + (?.!: \ [А-z0-9 # $% &!] * + / ? = ^ _ `{|} ~ -] +) * |” (?: [\ x01- \ x08 \ x0b \ x0c \ x0e- \ x1f \ x21 \ x23- \ X5b \ x5d- \ x7f] | \\ [\ x01- \ x09 \ x0b \ x0c \ x0e- \ x7f]) * “) @ (: (: [а-z0-9] (?:? [а-z0-9 -] * [а-z0 ? -9]) \) + [а-z0-9] (:.? [а-z0-9 -] * [а-z0-9]) | \ [(:( 🙁 2 (5? [0-5] | [0-4] [0-9]) | 1 [0-9] [0-9] |?. [1-9] [0-9])) \) {3} ( ? 🙁 2 (5 [0-5] | [0-4] [0-9]) | 1 [0-9] [0-9] | [1-9] [0-9]) |? [ а-z0-9 -] * [а-z0-9]: (?: [\ x01- \ x08 \ x0b \ x0c \ x0e- \ x1f \ x21- \ X5a \ x53- \ x7f] | \\ [\ x01- \ x09 \ x0b \ x0c \ x0e- \ x7f]) +) \])

    Здесь приведена диаграмма конечного автомата для вышеперечисленного регулярного выражения, которая более ясна, чем само регулярное выражение введите описание изображения здесь

    Более сложные шаблоны в Perl и PCRE (библиотека регулярных выражений, используемая, например, на PHP) могут корректно проанализировать RFC 5322 без заминки . Python и C # тоже могут это сделать, но они используют другой синтаксис из этих первых двух. Однако, если вы вынуждены использовать один из многих менее мощных языков сопоставления с образцами, тогда лучше всего использовать настоящий парсер.

    Также важно понимать, что проверка его в RFC абсолютно не говорит о том, действительно ли этот адрес существует в поставляемом домене, или является ли человек, входящий в адрес, его истинным владельцем. Люди подписывают других до списков рассылки таким образом все время. Фиксирование, требующее более любезной проверки, которая включает отправку этого адреса, сообщение, которое включает токен подтверждения, предназначенный для ввода на той же веб-странице, что и адрес.

    Подтверждающие жетоны – единственный способ узнать, что вы получили адрес человека, который его вводил. Вот почему большинство списков рассылки теперь используют этот механизм для подтверждения регистрации. В конце концов, любой может подавить [email protected] , и это будет даже рассматривать как законное, но вряд ли это будет с другой стороны.

    Для PHP вы не должны использовать шаблон, указанный в « Подтвердить адрес электронной почты с PHP», правильный путь, из которого я цитирую:

    Существует некоторая опасность того, что обычное использование и широкое неаккуратное кодирование установят фактический стандарт для адресов электронной почты, который является более ограничительным, чем записанный формальный стандарт.

    Это не лучше, чем все другие не-RFC-шаблоны. Он даже не достаточно умен, чтобы обрабатывать даже RFC 822 , не говоря уже о RFC 5322. Это , однако, так.

    Если вы хотите получить фантазию и педантичность, выполните полный механизм состояния . Регулярное выражение может действовать только как элементарный фильтр. Проблема с регулярными выражениями заключается в том, что кто-то говорит, что их совершенно корректный адрес электронной почты недействителен (ложный положительный результат), потому что ваше регулярное выражение не может справиться с этим, это просто грубо и невежливо с точки зрения пользователя. Механизм состояния для этой цели может как проверять, так и даже исправлять адреса электронной почты, которые в противном случае считались бы недействительными, поскольку он разбирает адрес электронной почты в соответствии с каждым RFC. Это позволяет потенциально более приятный опыт, например

    Указанный адрес электронной почты ‘myemail @ address, com’ недействителен. Вы имели в виду ‘[email protected]’?

    См. Также Проверка адресов электронной почты , включая комментарии. Или Сравнение адреса электронной почты с подтверждением регулярных выражений .

    Визуализация регулярных выражений

    Демоверсия Debuggex

    Вы не должны использовать регулярные выражения для проверки адресов электронной почты.

    Вместо этого используйте class MailAddress , например:

     try { address = new MailAddress(address).Address; } catch(FormatException) { //address is invalid } 

    Класс MailAddress использует парсер BNF для проверки адреса в полном соответствии с RFC822.

    Если вы действительно хотите использовать регулярное выражение, вот оно :

      (?: (?: \ r \ n)? [\ t]) * (?: (?: (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031 ] + (?: (?: (?: \ r \ n)? [\ t]
     ) + | \ Z | (= [\ [? "() <> @,;: \\". \ [\]])) | "? (: [^ \" \ Г \\] | \\. | (?: (?: \ r \ n)? [\ t])) * "(? :( ?:
     \ r \ n)? [\ t]) *) (?: \. (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \ \ ". \ [\] \ 000- \ 031] + (? :(? :(
     ?: \ r \ n)? [\ t]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [\]])) | "(?: [ ^ \ "\ г \\] | \\ | (:.? (: \ г \ п) [? 
     \ t])) * "(?: (?: \ r \ n)? [\ t]) *)) * @ (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 0
     31] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [\ ]])) | \ [([^ \ [\] \ г \\] |. \\) * \
     ] (?: (?: \ r \ n)? [\ t]) *) (?: \. (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] +
     (?: (?: (?: \ r \ n)? [\ t]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [\]]) ) | \ [([^ \ [\] \ г \\] |. \\) * \] (?:
     (?: \ r \ n)? [\ t]) *)) * | (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ r \ n)? [\ t]) + | \ Z
     | (= [\ [ "() <> @,;: \\?". \ [\]]).?) | "? (: [^ \" \ Г \\] | \\ | (:( ?: \ r \ n)? [\ t])) * "(?: (?: \ r \ n)
     ? [\ t]) *) * \ <(?: (?: \ r \ n)? [\ t]) * (?: @ (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \
     r \ n)? [\ t]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [\]])) \ \ [([^ \ [\ ] \ г \\] | \\) * \] (: (: \ г \ п?) [.?
      \ t]) *) (?: \. (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ г \ п)
     ? [\ t]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [\]])) | \ [([^ \ [\] \ r \ \] | \\.) * \] (?: (?: \ r \ n)? [\ t]
     ) *)) * (?:, @ (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ г \ п) [?
      \ Т]) + | \ Z | (= [\ [ "(<> @,;: \\. \ [\]?)"])) | \ [([^ \ [\] \ Г \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) *
     ) (?: \. (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031 ] + (?: (?: (?: \ r \ n)? [\ t]
     ) + | \ Z | (= [\ [? "() <> @,;: \\". \ [\]])) | \ [([^ \ [\] \ Г \\] | \\ .) * \] (?: (?: \ r \ n)? [\ t]) *)) *)
     *: (?: (?: \ r \ n)? [\ t]) *)? (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ r \ n)? [\ t]) +
     | \ Z | (= [\ [ "() <> @,;: \\?". \ [\]]).) | "? (: [^ \" \ Г \\] | \\ | ( ?: (?: \ r \ n)? [\ t])) * "(?: (?: \ r
     \ n)? [\ t]) *) (?: \. (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ " . \ [\] \ 000- \ 031] + (? :(? :( ?:
     \ r \ n)? [\ t]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [\]])) "" (?: [^ \ «\ r \\] | \\. | (?: (?: \ r \ n)? [\ t
     ])) * "(?: (?: \ r \ n)? [\ t]) *)) * @ (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031
     ] + (?: (?: (?: \ r \ n)? [\ t]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [\] ])) | \ [([^ \ [\] \ г \\] |. \\) * \] (
     ?: (?: \ r \ n)? [\ t]) *) (?: \. (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?
     : (?: (?: \ r \ n)? [\ t]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [\]])) | \ [([^ \ [\] \ г \\] | \\.) * \] (:(?
     : \ r \ n)? [\ t]) *)) * \> (?: (?: \ r \ n)? [\ t]) *) | (?: [^ () <> @ ;; : \\ ". \ [\] \ 000- \ 031] + (? :(?
     : (?: \ r \ n)? [\ t]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [\]]) | "(? : [^ \ "\ г \\] | \\ | (:. (: \ г \ п)??
     [\ t])) * "(?: (?: \ r \ n)? [\ t]) *) *: (?: (?: \ r \ n)? [\ t]) * (?: (: (: [^ () <> @,;:?. \\»\ [\] 
     \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @,;: \\" . \ [\]])) | "(?: [^ \" \ г \\] |
     \\. | (?: (?: \ r \ n)? [\ t])) * "(?: (?: \ r \ n)? [\ t]) *) (?: \. (? : (?: \ r \ n)? [\ t]) * (?: [^ () <>
    
     @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ [ "() <> @,;: \\". \ [\]])) |»
     (?: [^ \ "\ r \\] | \\. | (?: (?: \ r \ n)? [\ t])) *" (?: (?: \ r \ n)? [ \ t]) *)) * @ (?: (?: \ r \ n)? [\ t]
     ) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (= [\ [ "() <> @,;: \\?
     ". \ [\]])) | \ [([^ \ [\] \ r \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) * ) (?: \. (?: (?: \ r \ n)? [\ t]) * (?
     : [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z ?. | (= [\ [ "() <> @,;: \\" \ [
     \]])) | \ [([^ \ [\] \ r \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) *)) * | (?: [^ () <> @,;: \\ ". \ [\] \ 000-
     \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [ \]])) | "(?: [^ \". \ г \\] | \\ | (
     ?: (?: \ r \ n)? [\ t])) * "(?: (?: \ r \ n)? [\ t]) *) * \ <(?: (?: \ r \ n)? [\ t]) * (?: @ (?: [^ () <> @ ,;
     : \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ [" () <> @,;:. \\»\ [\]])) | \ [([
     ^ \ [\] \ r \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) *) (?: \. (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ "
     . \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @, ;:. \\»\ [\]])) | \ [([^ \ [\
     ] \ r \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) *)) * (?:, @ (?: (?: \ r \ n )? [\ t]) * (?: [^ () <> @,;: \\ ". \
     [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @,;: \\»\ [\]])) |. \ [([^ \ [\] \
     r \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) *) (?: \. (?: (?: \ r \ n)? [ т]) * (: [^ (<> @,.: \\»\ [\?)] 
     \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @,;: \\" . \ [\]])) | \ [([^ \ [\] \ г \\]
     | \\.) * \] (?: (?: \ r \ n)? [\ t]) *)) *) *: (?: (?: \ r \ n)? [\ t]) * )? (?: [^ () <> @,;: \\ ". \ [\] \ 0
     00- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [\]])) | "(?: [^ \" \ г \\] | \\
     . (?: (?: \ r \ n)? [\ t])) * "(?: (?: \ r \ n)? [\ t]) *) (?: \. (? :( ?: \ r \ n)? [\ t]) * (?: [^ () <> @,
     : \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ [" ( ) <> @,;: \\ | ( "\ [\]]))."?
     : [^ \ "\ r \\] | \\. | (?: (?: \ r \ n)? [\ t])) *" (?: (?: \ r \ n)? [\ t ]) *)) * @ (?: (?: \ r \ n)? [\ t]) *
     (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (= [\ [ "() <> @,;: \\?".
     \ [\]])) | \ [([^ \ [\] \ r \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) *) ( ?: \. (?: (?: \ r \ n)? [\ t]) * (?: [
     ^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | ( ? = [\ [ "() <> @,;: \\". \ [\]
     ])) | \ [([^ \ [\] \ r \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) *)) * \> ( ?: (?: \ r \ n)? [\ t]) *) (?:, \ s * (
     ?: (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (= [\ [ "() <> @,;: \\?
     ". \ [\]])) |" (?: [^ \ "\ r \\] | \\. | (?: (?: \ r \ n)? [\ t])) *" (? : (?: \ r \ n)? [\ t]) *) (?: \. (? :(
     ?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (? :(? :(? : \ r \ n)? [\ t]) + | \ Z | (? = [
     \ [ "() <> @,;: \\". \ [\]].?)) | "? (: [^ \" \ Г \\] | \\ | (: (: \ г \ n)? [\ t])) * "(?: (?: \ r \ n)? [\ t
     ]) *)) * @ (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T
     ]) + | \ Z | (= [\ [ "(<> @,;: \\. \ [\]?)"])) | \ [([^ \ [\] \ Г \\] | \ \.) * \] (?: (?: \ r \ n)? [\ t]) *) (?
     : \. (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + ( ?: (?: (?: \ r \ n)? [\ t]) + |
     \ Z | (= [\ [ "() <> @,;: \\?". \ [\]]).) | \ [([^ \ [\] \ Г \\] | \\) * \] (?: (?: \ r \ n)? [\ t]) *)) * | (?:
     [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | ? (= [\ [ "() <> @,;: \\". \ [\
     ]])) | "(?: [^ \" \ r \\] | \\. | (?: (?: \ r \ n)? [\ t])) * "(?: (?: \ r \ n)? [\ t]) *) * \ <(?: (?: \ r \ n)
     ? [\ t]) * (?: @ (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ r \ n)? [\ t]) + | \ Z | (? = [\ ["
     () <> @,;:..? \\»\ [\]])) | \ [([^ \ [\] \ г \\] | \\) * \] (: (: \ г \ n)? [\ t]) *) (?: \. (?: (?: \ r \ n)
     ? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ r \ n)? [\ t]) + | \ Z | (? = [\ ["() <>
    
     @,;: \\»\ [\]])) | \ [([^ \ [\] \ г \\] | \\) * \] (: (: \ г \ п)..?? [\ t]) *)) * (?:, @ (?: (?: \ r \ n)? [
      \ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ r \ n)? [\ т]) + | \ Z | (? = [\ [ "() <> @,
     : \\ ". \ [\]])) | \ [([^ \ [\] \ r \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) *) (?: \. (?: (?: \ r \ n)? [\ t]
     ) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (= [\ [ "() <> @,;: \\?
     ". \ [\]])) | \ [([^ \ [\] \ r \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) * )) *) *: (?: (?: \ r \ n)? [\ t]) *)?
     (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (= [\ [ "() <> @,;: \\?".
     \ [\]])) | "(?: [^ \" \ r \\] | \\. | (?: (?: \ r \ n)? [\ t])) * "(? :( ?: \ r \ n)? [\ t]) *) (?: \. (? :( ?:
     \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ r \ n)? [\ t]) + | \ Z | (? = [\ [
     "() <> @,;: \\". \ [\]])) | "(?: [^ \".? \ Г \\] | \\ | (: (: \ г \ п) ? [\ t])) * "(?: (?: \ r \ n)? [\ t])
     *)) * @ (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ r \ n)? [\ t])
     + | \ Z | (= [\ [ "() <> @,;: \\"? \ [\]].)) | \ [([^ \ [\] \ Г \\] | \\. ) * \] (?: (?: \ r \ n)? [\ t]) *) (?: \
     . (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ r \ n)? [\ t]) + | \ Z
     | (= [\ [ "() <> @,;: \\?". \ [\]]).) | \ [([^ \ [\] \ Г \\] | \\) * \] (?: (?: \ r \ n)? [\ t]) *)) * \> (? :(
     ?: \ r \ n)? [\ t]) *)) *)?; \ s *) 

    Этот вопрос задают много, но я думаю, вы должны отступить и спросить себя, почему вы хотите синтаксически проверять адреса электронной почты? Какая польза действительно?

    • Он не поймает распространенные опечатки.
    • Это не мешает людям вводить неверные или заполненные адреса электронной почты или вводить чужой адрес.

    Если вы хотите проверить правильность адреса электронной почты, у вас нет выбора, кроме как отправить электронное письмо с подтверждением и ответить на него. Во многих случаях вам придется отправлять письмо с подтверждением в любом случае по соображениям безопасности или по этическим соображениям (поэтому вы не можете, например, подписывать кого-то на службу против своей воли).

    Это зависит от того, что вы имеете в виду: если вы говорите об улавливании каждого действительного адреса электронной почты, используйте следующее:

     (?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t] )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?: \r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:( ?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0 31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\ ](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+ (?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?: (?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z |(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n) ?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\ r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n) ?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t] )*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])* )(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t] )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*) *:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+ |\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r \n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?: \r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t ]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031 ]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\]( ?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(? :(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(? :\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(? :(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)? [ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]| \\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<> @,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|" (?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t] )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(? :[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[ \]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000- \031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|( ?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,; :\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([ ^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\" .\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\ ]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\ [\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\ r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\] |\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0 00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\ .|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@, ;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(? :[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])* (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\". \[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[ ^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\] ]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*( ?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:( ?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[ \["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t ])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t ])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(? :\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+| \Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?: [^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\ ]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n) ?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[" ()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n) ?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<> @,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@, ;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t] )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)? (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\". \[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?: \r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[ "()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t]) *))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]) +|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\ .(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z |(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:( ?:\r\n)?[ \t])*))*)?;\s*) 

    ( http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html ) Если вы ищете что-то более простое, но это поймает большинство действительных адресов электронной почты, попробуйте что-то вроде:

     "^[a-zA-Z0-9_.+-][email protected][a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$" 

    EDIT: по ссылке:

    Это регулярное выражение будет проверять только адреса, у которых были сняты любые комментарии и заменены пробелами (это выполняется модулем).

    Все зависит от того, насколько вы точны. Для моих целей, когда я просто стараюсь [email protected] таких вещей, как bob @ aol.com (пробелы в сообщениях электронной почты) или steve (без какого-либо домена) или [email protected] (без периода до .com), я использую

     /^\[email protected]\S+\.\S+$/ 

    Конечно, это будет соответствовать вещам, которые не являются действительными адресами электронной почты, но это вопрос игры с правилом 90/10.

    [ОБНОВЛЕНО] Я собрал все, что я знаю о проверке адреса электронной почты здесь: http://isemail.info , который теперь не только проверяет, но и диагностирует проблемы с адресами электронной почты. Я согласен со многими комментариями здесь, что валидация является лишь частью ответа; см. мое эссе по адресу http://isemail.info/about .

    is_email () остается, насколько я знаю, единственным валидатором, который точно скажет вам, является ли данная строка допустимым адресом электронной почты или нет. Я загрузил новую версию на http://isemail.info/

    Я собрал тестовые примеры от Кэла Хендерсона, Дэйва Ребенка, Фила Хаака, Дуга Ловелла, RFC5322 и RFC 3696. Всего 275 тестовых адресов. Я провел все эти тесты со всеми бесплатными валидаторами, которые мог найти.

    Я постараюсь сохранить эту страницу в актуальном состоянии, так как люди повышают эффективность своих валидаторов. Спасибо Cal, Michael, Dave, Paul и Phil за помощь и сотрудничество в компиляции этих тестов и конструктивную критику моего собственного валидатора .

    Люди должны знать об ошибках в отношении RFC 3696 в частности. Три из канонических примеров на самом деле являются неверными адресами. Максимальная длина адреса – 254 или 256 символов, а не 320.

    В спецификации W3C HTML5 :

     ^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-][email protected][a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$ 

    Контекст:

    Действительный адрес электронной почты – это строка, соответствующая продукции ABNF […].

    Примечание. Это требование является преднамеренным нарушением RFC 5322 , который определяет синтаксис для адресов электронной почты, которые одновременно слишком строгие (до символа «@»), слишком неопределенные (после символа «@») и слишком слабые ( позволяя использовать комментарии, символы пробелов и цитируемые строки в манерах, незнакомых большинству пользователей).

    Следующее стандартное выражение, совместимое с JavaScript и Perl, является реализацией указанного выше определения.

     /^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-][email protected][a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/ 

    Это легко в Perl 5.10 или новее:

     /(?(DEFINE) (?
    (?&mailbox) | (?&group)) (? (?&name_addr) | (?&addr_spec)) (? (?&display_name)? (?&angle_addr)) (? (?&CFWS)? < (?&addr_spec) > (?&CFWS)?) (? (?&display_name) : (?:(?&mailbox_list) | (?&CFWS))? ; (?&CFWS)?) (? (?&phrase)) (? (?&mailbox) (?: , (?&mailbox))*) (? (?&local_part) \@ (?&domain)) (? (?&dot_atom) | (?&quoted_string)) (? (?&dot_atom) | (?&domain_literal)) (? (?&CFWS)? \[ (?: (?&FWS)? (?&dcontent))* (?&FWS)? \] (?&CFWS)?) (? (?&dtext) | (?&quoted_pair)) (? (?&NO_WS_CTL) | [\x21-\x5a\x5e-\x7e]) (? (?&ALPHA) | (?&DIGIT) | [!#\$%&'*+-/=?^_`{|}~]) (? (?&CFWS)? (?&atext)+ (?&CFWS)?) (? (?&CFWS)? (?&dot_atom_text) (?&CFWS)?) (? (?&atext)+ (?: \. (?&atext)+)*) (? [\x01-\x09\x0b\x0c\x0e-\x7f]) (? \\ (?&text)) (? (?&NO_WS_CTL) | [\x21\x23-\x5b\x5d-\x7e]) (? (?&qtext) | (?&quoted_pair)) (? (?&CFWS)? (?&DQUOTE) (?:(?&FWS)? (?&qcontent))* (?&FWS)? (?&DQUOTE) (?&CFWS)?) (? (?&atom) | (?&quoted_string)) (? (?&word)+) # Folding white space (? (?: (?&WSP)* (?&CRLF))? (?&WSP)+) (? (?&NO_WS_CTL) | [\x21-\x27\x2a-\x5b\x5d-\x7e]) (? (?&ctext) | (?&quoted_pair) | (?&comment)) (? \( (?: (?&FWS)? (?&ccontent))* (?&FWS)? \) ) (? (?: (?&FWS)? (?&comment))* (?: (?:(?&FWS)? (?&comment)) | (?&FWS))) # No whitespace control (? [\x01-\x08\x0b\x0c\x0e-\x1f\x7f]) (? [A-Za-z]) (? [0-9]) (? \x0d \x0a) (? ") (? [\x20\x09]) ) (?&address)/x

    я использую

     ^\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*$ 

    Which is the one used in ASP.NET by the RegularExpressionValidator.

    Don’t know about best, but this one is at least correct, as long as the addresses have their comments stripped and replaced with whitespace.

    Шутки в сторону. You should use an already written library for validating emails. The best way is probably to just send a verification e-mail to that address.

    The email addresses I want to validate are going to be used by an ASP.NET web application using the System.Net.Mail namespace to send emails to a list of people. So, rather than using some very complex regular expression, I just try to create a MailAddress instance from the address. The MailAddress construtor will throw an exception if the address is not formed properly. This way, I know I can at least get the email out of the door. Of course this is server-side validation but at a minimum you need that anyway.

     protected void emailValidator_ServerValidate(object source, ServerValidateEventArgs args) { try { var a = new MailAddress(txtEmail.Text); } catch (Exception ex) { args.IsValid = false; emailValidator.ErrorMessage = "email: " + ex.Message; } } 

    Quick answer

    Use the following regex for input validation:

    ([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)+

    Addresses matched by this regex:

    • have a local part (ie the part before the @-sign) that is strictly compliant with RFC 5321/5322,
    • have a domain part (ie the part after the @-sign) that is a host name with at least two labels, each of which is at most 63 characters long.

    The second constraint is a restriction on RFC 5321/5322.

    Elaborate answer

    Using a regular expression that recognizes email addresses could be useful in various situations: for example to scan for email addresses in a document, to validate user input, or as an integrity constraint on a data repository.

    It should however be noted that if you want to find out if the address actually refers to an existing mailbox, there’s no substitute for sending a message to the address. If you only want to check if an address is grammatically correct then you could use a regular expression, but note that ""@[] is a grammatically correct email address that certainly doesn’t refer to an existing mailbox.

    The syntax of email addresses has been defined in various RFCs , most notably RFC 822 and RFC 5322 . RFC 822 should be seen as the “original” standard and RFC 5322 as the latest standard. The syntax defined in RFC 822 is the most lenient and subsequent standards have restricted the syntax further and further, where newer systems or services should recognize obsolete syntax, but never produce it.

    In this answer I’ll take “email address” to mean addr-spec as defined in the RFCs (ie [email protected] , but not "John Doe" , nor some-group:[email protected],[email protected]; ).

    There’s one problem with translating the RFC syntaxes into regexes: the syntaxes are not regular! This is because they allow for optional comments in email addresses that can be infinitely nested, while infinite nesting can’t be described by a regular expression. To scan for or validate addresses containing comments you need a parser or more powerful expressions. (Note that languages like Perl have constructs to describe context free grammars in a regex-like way.) In this answer I’ll disregard comments and only consider proper regular expressions.

    The RFCs define syntaxes for email messages, not for email addresses as such. Addresses may appear in various header fields and this is where they are primarily defined. When they appear in header fields addresses may contain (between lexical tokens) whitespace, comments and even linebreaks. Semantically this has no significance however. By removing this whitespace, etc. from an address you get a semantically equivalent canonical representation . Thus, the canonical representation of first. last (comment) @ [3.5.7.9] is [email protected][3.5.7.9] .

    Different syntaxes should be used for different purposes. If you want to scan for email addresses in a (possibly very old) document it may be a good idea to use the syntax as defined in RFC 822. On the other hand, if you want to validate user input you may want to use the syntax as defined in RFC 5322, probably only accepting canonical representations. You should decide which syntax applies to your specific case.

    I use POSIX “extended” regular expressions in this answer, assuming an ASCII compatible character set.

    RFC 822

    I arrived at the following regular expression. I invite everyone to try and break it. If you find any false positives or false negatives, please post them in a comment and I’ll try to fix the expression as soon as possible.

    ([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]))*(\\\r)*")(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]))*(\\\r)*"))*@([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]))*(\\\r)*])(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]))*(\\\r)*]))*

    I believe it’s fully complient with RFC 822 including the errata . It only recognizes email addresses in their canonical form. For a regex that recognizes (folding) whitespace see the derivation below.

    The derivation shows how I arrived at the expression. I list all the relevant grammar rules from the RFC exactly as they appear, followed by the corresponding regex. Where an erratum has been published I give a separate expression for the corrected grammar rule (marked “erratum”) and use the updated version as a subexpression in subsequent regular expressions.

    As stated in paragraph 3.1.4. of RFC 822 optional linear white space may be inserted between lexical tokens. Where applicable I’ve expanded the expressions to accommodate this rule and marked the result with “opt-lwsp”.

     CHAR =  =~ . CTL =  =~ [\x00-\x1F\x7F] CR =  =~ \r LF =  =~ \n SPACE =  =~ HTAB =  =~ \t <"> =  =~ " CRLF = CR LF =~ \r\n LWSP-char = SPACE / HTAB =~ [ \t] linear-white-space = 1*([CRLF] LWSP-char) =~ ((\r\n)?[ \t])+ specials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "\" / <"> / "." / "[" / "]" =~ [][()<>@,;:\\".] quoted-pair = "\" CHAR =~ \\. qtext = , "\" & CR, and including linear-white-space> =~ [^"\\\r]|((\r\n)?[ \t])+ dtext =  =~ [^][\\\r]|((\r\n)?[ \t])+ quoted-string = <"> *(qtext|quoted-pair) <"> =~ "([^"\\\r]|((\r\n)?[ \t])|\\.)*" (erratum) =~ "(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*" domain-literal = "[" *(dtext|quoted-pair) "]" =~ \[([^][\\\r]|((\r\n)?[ \t])|\\.)*] (erratum) =~ \[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*] atom = 1* =~ [^][()<>@,;:\\". \x00-\x1F\x7F]+ word = atom / quoted-string =~ [^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*" domain-ref = atom sub-domain = domain-ref / domain-literal =~ [^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*] local-part = word *("." word) =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*")(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*"))* (opt-lwsp) =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*")(((\r\n)?[ \t])*\.((\r\n)?[ \t])*([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*"))* domain = sub-domain *("." sub-domain) =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*])(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*]))* (opt-lwsp) =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*])(((\r\n)?[ \t])*\.((\r\n)?[ \t])*([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*]))* addr-spec = local-part "@" domain =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*")(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*"))*@([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*])(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*]))* (opt-lwsp) =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*")((\r\n)?[ \t])*(\.((\r\n)?[ \t])*([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*")((\r\n)?[ \t])*)*@((\r\n)?[ \t])*([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*])(((\r\n)?[ \t])*\.((\r\n)?[ \t])*([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*]))* (canonical) =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]))*(\\\r)*")(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]))*(\\\r)*"))*@([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]))*(\\\r)*])(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]))*(\\\r)*]))* 

    RFC 5322

    I arrived at the following regular expression. I invite everyone to try and break it. If you find any false positives or false negatives, please post them in a comment and I’ll try to fix the expression as soon as possible.

    ([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|\[[\t -Z^-~]*])

    I believe it’s fully complient with RFC 5322 including the errata . It only recognizes email addresses in their canonical form. For a regex that recognizes (folding) whitespace see the derivation below.

    The derivation shows how I arrived at the expression. I list all the relevant grammar rules from the RFC exactly as they appear, followed by the corresponding regex. For rules that include semantically irrelevant (folding) whitespace, I give a separate regex marked “(normalized)” that doesn’t accept this whitespace.

    I ignored all the “obs-” rules from the RFC. This means that the regexes only match email addresses that are strictly RFC 5322 compliant. If you have to match “old” addresses (as the looser grammar including the “obs-” rules does), you can use one of the RFC 822 regexes from the previous paragraph.

     VCHAR = %x21-7E =~ [!-~] ALPHA = %x41-5A / %x61-7A =~ [A-Za-z] DIGIT = %x30-39 =~ [0-9] HTAB = %x09 =~ \t CR = %x0D =~ \r LF = %x0A =~ \n SP = %x20 =~ DQUOTE = %x22 =~ " CRLF = CR LF =~ \r\n WSP = SP / HTAB =~ [\t ] quoted-pair = "\" (VCHAR / WSP) =~ \\[\t -~] FWS = ([*WSP CRLF] 1*WSP) =~ ([\t ]*\r\n)?[\t ]+ ctext = %d33-39 / %d42-91 / %d93-126 =~ []!-'*-[^-~] ("comment" is left out in the regex) ccontent = ctext / quoted-pair / comment =~ []!-'*-[^-~]|(\\[\t -~]) (not regular) comment = "(" *([FWS] ccontent) [FWS] ")" (is equivalent to FWS when leaving out comments) CFWS = (1*([FWS] comment) [FWS]) / FWS =~ ([\t ]*\r\n)?[\t ]+ atext = ALPHA / DIGIT / "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "/" / "=" / "?" / "^" / "_" / "`" / "{" / "|" / "}" / "~" =~ [-!#-'*+/-9=?AZ^-~] dot-atom-text = 1*atext *("." 1*atext) =~ [-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)* dot-atom = [CFWS] dot-atom-text [CFWS] =~ (([\t ]*\r\n)?[\t ]+)?[-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*(([\t ]*\r\n)?[\t ]+)? (normalized) =~ [-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)* qtext = %d33 / %d35-91 / %d93-126 =~ []!#-[^-~] qcontent = qtext / quoted-pair =~ []!#-[^-~]|(\\[\t -~]) (erratum) quoted-string = [CFWS] DQUOTE ((1*([FWS] qcontent) [FWS]) / FWS) DQUOTE [CFWS] =~ (([\t ]*\r\n)?[\t ]+)?"(((([\t ]*\r\n)?[\t ]+)?([]!#-[^-~]|(\\[\t -~])))+(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?)"(([\t ]*\r\n)?[\t ]+)? (normalized) =~ "([]!#-[^-~ \t]|(\\[\t -~]))+" dtext = %d33-90 / %d94-126 =~ [!-Z^-~] domain-literal = [CFWS] "[" *([FWS] dtext) [FWS] "]" [CFWS] =~ (([\t ]*\r\n)?[\t ]+)?\[((([\t ]*\r\n)?[\t ]+)?[!-Z^-~])*(([\t ]*\r\n)?[\t ]+)?](([\t ]*\r\n)?[\t ]+)? (normalized) =~ \[[\t -Z^-~]*] local-part = dot-atom / quoted-string =~ (([\t ]*\r\n)?[\t ]+)?[-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?"(((([\t ]*\r\n)?[\t ]+)?([]!#-[^-~]|(\\[\t -~])))+(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?)"(([\t ]*\r\n)?[\t ]+)? (normalized) =~ [-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+" domain = dot-atom / domain-literal =~ (([\t ]*\r\n)?[\t ]+)?[-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?\[((([\t ]*\r\n)?[\t ]+)?[!-Z^-~])*(([\t ]*\r\n)?[\t ]+)?](([\t ]*\r\n)?[\t ]+)? (normalized) =~ [-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|\[[\t -Z^-~]*] addr-spec = local-part "@" domain =~ ((([\t ]*\r\n)?[\t ]+)?[-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?"(((([\t ]*\r\n)?[\t ]+)?([]!#-[^-~]|(\\[\t -~])))+(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?)"(([\t ]*\r\n)?[\t ]+)?)@((([\t ]*\r\n)?[\t ]+)?[-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?\[((([\t ]*\r\n)?[\t ]+)?[!-Z^-~])*(([\t ]*\r\n)?[\t ]+)?](([\t ]*\r\n)?[\t ]+)?) (normalized) =~ ([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|\[[\t -Z^-~]*]) 

    Note that some sources (notably w3c ) claim that RFC 5322 is too strict on the local part (ie the part before the @-sign). This is because “..”, “a..b” and “a.” are not valid dot-atoms, while they may be used as mailbox names. The RFC, however, does allow for local parts like these, except that they have to be quoted. So instead of [email protected] you should write "a..b"@example.net , which is semantically equivalent.

    Further restrictions

    SMTP (as defined in RFC 5321 ) further restricts the set of valid email addresses (or actually: mailbox names). It seems reasonable to impose this stricter grammar, so that the matched email address can actually be used to send an email.

    RFC 5321 basically leaves alone the “local” part (ie the part before the @-sign), but is stricter on the domain part (ie the part after the @-sign). It allows only host names in place of dot-atoms and address literals in place of domain literals.

    The grammar presented in RFC 5321 is too lenient when it comes to both host names and IP addresses. I took the liberty of “correcting” the rules in question, using this draft and RFC 1034 as guidelines. Here’s the resulting regex.

    ([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)*|\[((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}|IPv6:((((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)|(?!IPv6:)[0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+)])

    Note that depending on the use case you may not want to allow for a “General-address-literal” in your regex. Also note that I used a negative lookahead (?!IPv6:) in the final regex to prevent the “General-address-literal” part to match malformed IPv6 addresses. Some regex processors don’t support negative lookahead. Remove the substring |(?!IPv6:)[0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+ from the regex if you want to take the whole “General-address-literal” part out.

    Here’s the derivation:

     Let-dig = ALPHA / DIGIT =~ [0-9A-Za-z] Ldh-str = *( ALPHA / DIGIT / "-" ) Let-dig =~ [0-9A-Za-z-]*[0-9A-Za-z] (regex is updated to make sure sub-domains are max. 63 charactes long - RFC 1034 section 3.5) sub-domain = Let-dig [Ldh-str] =~ [0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])? Domain = sub-domain *("." sub-domain) =~ [0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)* Snum = 1*3DIGIT =~ [0-9]{1,3} (suggested replacement for "Snum") ip4-octet = DIGIT / %x31-39 DIGIT / "1" 2DIGIT / "2" %x30-34 DIGIT / "25" %x30-35 =~ 25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9] IPv4-address-literal = Snum 3("." Snum) =~ [0-9]{1,3}(\.[0-9]{1,3}){3} (suggested replacement for "IPv4-address-literal") ip4-address = ip4-octet 3("." ip4-octet) =~ (25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3} (suggested replacement for "IPv6-hex") ip6-h16 = "0" / ( (%x49-57 / %x65-70 /%x97-102) 0*3(%x48-57 / %x65-70 /%x97-102) ) =~ 0|[1-9A-Fa-f][0-9A-Fa-f]{0,3} (not from RFC) ls32 = ip6-h16 ":" ip6-h16 / ip4-address =~ (0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3} (suggested replacement of "IPv6-addr") ip6-address = 6(ip6-h16 ":") ls32 / "::" 5(ip6-h16 ":") ls32 / [ ip6-h16 ] "::" 4(ip6-h16 ":") ls32 / [ *1(ip6-h16 ":") ip6-h16 ] "::" 3(ip6-h16 ":") ls32 / [ *2(ip6-h16 ":") ip6-h16 ] "::" 2(ip6-h16 ":") ls32 / [ *3(ip6-h16 ":") ip6-h16 ] "::" ip6-h16 ":" ls32 / [ *4(ip6-h16 ":") ip6-h16 ] "::" ls32 / [ *5(ip6-h16 ":") ip6-h16 ] "::" ip6-h16 / [ *6(ip6-h16 ":") ip6-h16 ] "::" =~ (((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?:: IPv6-address-literal = "IPv6:" ip6-address =~ IPv6:((((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::) Standardized-tag = Ldh-str =~ [0-9A-Za-z-]*[0-9A-Za-z] dcontent = %d33-90 / %d94-126 =~ [!-Z^-~] General-address-literal = Standardized-tag ":" 1*dcontent =~ [0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+ address-literal = "[" ( IPv4-address-literal / IPv6-address-literal / General-address-literal ) "]" =~ \[((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}|IPv6:((((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)|(?!IPv6:)[0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+)] Mailbox = Local-part "@" ( Domain / address-literal ) =~ ([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)*|\[((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}|IPv6:((((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)|(?!IPv6:)[0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+)]) 

    User input validation

    A common use case is user input validation, for example on an html form. In that case it’s usually reasonable to preclude address-literals and to require at least two labels in the hostname. Taking the improved RFC 5321 regex from the previous section as a basis, the resulting expression would be:

    ([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)+

    I do not recommend restricting the local part further, eg by precluding quoted strings, since we don’t know what kind of mailbox names some hosts allow (like "a..b"@example.net or even "ab"@example.net ).

    I also do not recommend explicitly validating against a list of literal top-level domains or even imposing length-constraints (remember how “.museum” invalidated [az]{2,4} ), but if you must:

    ([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?\.)*(net|org|com|info| etc… )

    Make sure to keep your regex up-to-date if you decide to go down the path of explicit top-level domain validation.

    Further considerations

    When only accepting host names in the domain part (after the @-sign), the regexes above accept only labels with at most 63 characters, as they should. However, they don’t enforce the fact that the entire host name must be at most 253 characters long (including the dots). Although this constraint is strictly speaking still regular, it’s not feasible to make a regex that incorporates this rule.

    Another consideration, especially when using the regexes for input validation, is feedback to the user. If a user enters an incorrect address, it would be nice to give a little more feedback than a simple “syntactically incorrect address”. With “vanilla” regexes this is not possible.

    These two considerations could be addressed by parsing the address. The extra length constraint on host names could in some cases also be addressed by using an extra regex that checks it, and matching the address against both expressions.

    None of the regexes in this answer are optimized for performance. If performance is an issue, you should see if (and how) the regex of your choice can be optimized.

    There are plenty examples of this out on the net (and I think even one that fully validates the RFC – but it’s tens/hundreds of lines long if memory serves). People tend to get carried away validating this sort of thing. Why not just check it has an @ and at least one . and meets some simple minimum length. It’s trivial to enter a fake email and still match any valid regex anyway. I would guess that false positives are better than false negatives.

    While deciding which characters are allowed, please remember your apostrophed and hyphenated friends. I have no control over the fact that my company generates my email address using my name from the HR system. That includes the apostrophe in my last name. I can’t tell you how many times I have been blocked from interacting with a website by the fact that my email address is “invalid”.

    This regex is from Perl’s Email::Valid library. I believe it to be the most accurate, it matches all 822. And, it is based on the regular expression in the O’Reilly book:

    Regular expression built using Jeffrey Friedl’s example in Mastering Regular Expressions ( http://www.ora.com/catalog/regexp/ ).

     $RFC822PAT = <<'EOF'; [\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\ xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xf f\n\015()]*)*\)[\040\t]*)*(?:(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\x ff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|"[^\\\x80-\xff\n\015 "]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[\040\t]*(?:\([^\\\x80-\ xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80 -\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]* )*(?:\.[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\ \\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\ x80-\xff\n\015()]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x8 0-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|"[^\\\x80-\xff\n \015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[\040\t]*(?:\([^\\\x 80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^ \x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040 \t]*)*)*@[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([ ^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\ \\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\ x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80- \xff\n\015\[\]]|\\[^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015() ]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\ x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:\.[\04 0\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\ n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\ 015()]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?! [^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\ ]]|\\[^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\ x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\01 5()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*)*|(?:[^(\040)<>@,;:". \\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff] )|"[^\\\x80-\xff\n\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[^ ()<>@,;:".\\\[\]\x80-\xff\000-\010\012-\037]*(?:(?:\([^\\\x80-\xff\n\0 15()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][ ^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)|"[^\\\x80-\xff\ n\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[^()<>@,;:".\\\[\]\ x80-\xff\000-\010\012-\037]*)*<[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(? :(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80- \xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:@[\040\t]* (?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015 ()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015() ]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\0 40)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\ [^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\ xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]* )*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:\.[\040\t]*(?:\([^\\\x80 -\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x 80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t ]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\ \[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff]) *\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x 80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80 -\xff\n\015()]*)*\)[\040\t]*)*)*(?:,[\040\t]*(?:\([^\\\x80-\xff\n\015( )]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\ \x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*@[\040\t ]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\0 15()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015 ()]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^( \040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]| \\[^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80 -\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015() ]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:\.[\040\t]*(?:\([^\\\x 80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^ \x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040 \t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:". \\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff ])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\ \x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x 80-\xff\n\015()]*)*\)[\040\t]*)*)*)*:[\040\t]*(?:\([^\\\x80-\xff\n\015 ()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\ \\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*)?(?:[^ (\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000- \037\x80-\xff])|"[^\\\x80-\xff\n\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\ n\015"]*)*")[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]| \([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\)) [^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:\.[\040\t]*(?:\([^\\\x80-\xff \n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\x ff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*( ?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\ 000-\037\x80-\xff])|"[^\\\x80-\xff\n\015"]*(?:\\[^\x80-\xff][^\\\x80-\ xff\n\015"]*)*")[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\x ff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*) *\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*)*@[\040\t]*(?:\([^\\\x80-\x ff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80- \xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*) *(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\ ]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\] )[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80- \xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\x ff\n\015()]*)*\)[\040\t]*)*(?:\.[\040\t]*(?:\([^\\\x80-\xff\n\015()]*( ?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80 -\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:[^(\040)< >@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x8 0-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\])[\040\t]*(?: \([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()] *(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*) *\)[\040\t]*)*)*>) EOF 

    As you’re writing in PHP I’d advice you to use the PHP build-in validation for emails.

     filter_var($value, FILTER_VALIDATE_EMAIL) 

    If you’re running a php-version lower than 5.3.6 please be aware of this issue: https://bugs.php.net/bug.php?id=53091

    If you want more information how this buid-in validation works, see here: Does PHP’s filter_var FILTER_VALIDATE_EMAIL actually work?

    Cal Henderson (Flickr) wrote an article called Parsing Email Adresses in PHP and shows how to do proper RFC (2)822-compliant Email Address parsing. You can also get the source code in php , python and ruby which is cc licensed .

    I never bother creating with my own regular expression, because chances are that someone else has already come up with a better version. I always use regexlib to find one to my liking.

    There is not one which is really usable.
    I discuss some issues in my answer to Is there a php library for email address validation? , it is discussed also in Regexp recognition of email address hard?

    In short, don’t expect a single, usable regex to do a proper job. And the best regex will validate the syntax, not the validity of an e-mail ([email protected] is correct but it will probably bounce…).

    One simple regular expression which would at least not reject any valid email address would be checking for something, followed by an @ sign and then something followed by a period and at least 2 somethings. It won’t reject anything, but after reviewing the spec I can’t find any email that would be valid and rejected.

    email =~ /[email protected][^@]+\.[^@]{2,}$/

    You could use the one employed by the jQuery Validation plugin:

     /^((([az]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+(\.([az]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+)*)|((\x22)((((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(([\x01-\x08\x0b\x0c\x0e-\x1f\x7f]|\x21|[\x23-\x5b]|[\x5d-\x7e]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(\\([\x01-\x09\x0b\x0c\x0d-\x7f]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]))))*(((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(\x22)))@((([az]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([az]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([az]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([az]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([az]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([az]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([az]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([az]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?$/i 

    For the most comprehensive evaluation of the best regular expression for validating an email address please see this link; ” Comparing E-mail Address Validating Regular Expressions ”

    Here is the current top expression for reference purposes:

     /^([\w\!\#$\%\&\'\*\+\-\/\=\?\^\`{\|\}\~]+\.)*[\w\!\#$\%\&\'\*\+\-\/\=\?\^\`{\|\}\~][email protected]((((([a-z0-9]{1}[a-z0-9\-]{0,62}[a-z0-9]{1})|[az])\.)+[az]{2,6})|(\d{1,3}\.){3}\d{1,3}(\:\d{1,5})?)$/i 

    Not to mention that non-Latin (Chinese, Arabic, Greek, Hebrew, Cyrillic and so on) domain names are to be allowed in the near future . Everyone has to change the email regex used, because those characters are surely not to be covered by [az]/i nor \w . They will all fail.

    After all, the best way to validate the email address is still to actually send an email to the address in question to validate the address. If the email address is part of user authentication (register/login/etc), then you can perfectly combine it with the user activation system. Ie send an email with a link with an unique activation key to the specified email address and only allow login when the user has activated the newly created account using the link in the email.

    If the purpose of the regex is just to quickly inform the user in the UI that the specified email address doesn’t look like in the right format, best is still to check if it matches basically the following regex:

     ^([^[email protected]]+)(\.[^[email protected]]+)*@([^[email protected]]+\.)+([^[email protected]]+)$ 

    Просто как тот. Why on earth would you care about the characters used in the name and domain? It’s the client’s responsibility to enter a valid email address, not the server’s. Even when the client enters a syntactically valid email address like [email protected] , this does not guarantee that it’s a legit email address. No one regex can cover that.

    The HTML5 spec suggests a simple regex for validating email addresses:

     /^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-][email protected][a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/ 

    This intentionally doesn’t comply with RFC 5322 .

    Note: This requirement is a willful violation of RFC 5322 , which defines a syntax for e-mail addresses that is simultaneously too strict (before the @ character), too vague (after the @ character), and too lax (allowing comments, whitespace characters, and quoted strings in manners unfamiliar to most users) to be of practical use here.

    The total length could also be limited to 254 characters, per RFC 3696 errata 1690 .

    For a vivid demonstration, the following monster is pretty good but still does not correctly recognize all syntactically valid email addresses: it recognizes nested comments up to four levels deep.

    This is a job for a parser, but even if an address is syntactically valid, it still may not be deliverable. Sometimes you have to resort to the hillbilly method of “Hey, y’all, watch ee-us!”

     // derivative of work with the following copyright and license: // Copyright (c) 2004 Casey West. All rights reserved. // This module is free software; you can redistribute it and/or // modify it under the same terms as Perl itself. // see http://search.cpan.org/~cwest/Email-Address-1.80/ private static string gibberish = @" (?-xism:(?:(?-xism:(?-xism:(?-xism:(?-xism:(?-xism:(?-xism:\ s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^ \x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+)) |(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+ |\s+)*[^\x00-\x1F\x7F()<>\[\]:;@\,.\s]+(?-xism:(?-xism:\ s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^ \x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+)) |(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+ |\s+)*)|(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:( ?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((? :\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x 0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*(?-xism:(?-xism:[ ^\\])|(?-xism:\\(?-xism:[^\x0A\x0D])))+(?-xism:(?-xi sm:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xis m:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\ ]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\ s*)+|\s+)*))+)?(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(? -xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism: \s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[ ^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*<(?-xism:(?-xi sm:(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^( )\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*( ?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D])) |)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*(?-xism:[^\x00-\x1F\x7F()< >\[\]:;@\,.\s]+(?:\.[^\x00-\x1F\x7F()<>\[\]:;@\,.\s] +)*)(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+)) |(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism: (?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s *\)\s*))+)*\s*\)\s*)+|\s+)*)|(?-xism:(?-xism:(?-xism:\s*\((? :\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x 0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xi sm:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)* (?-xism:(?-xism:[^\\])|(?-xism:\\(?-xism:[^\x0A\x0D] )))+(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\ ]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-x ism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+ )*\s*\)\s*))+)*\s*\)\s*)+|\s+)*))\@(?-xism:(?-xism:(?-xism:( ?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(? -xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^ ()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s *\)\s*)+|\s+)*(?-xism:[^\x00-\x1F\x7F()<>\[\]:;@\,.\s]+( ?:\.[^\x00-\x1F\x7F()<>\[\]:;@\,.\s]+)*)(?-xism:(?-xism: \s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[ ^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+) )|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*) +|\s+)*)|(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism: (?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\(( ?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\ x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*\[(?:\s*(?-xism:(?-x ism:[^\[\]\\])|(?-xism:\\(?-xism:[^\x0A\x0D])))+)*\s*\](?-xi sm:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism: \\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:( ?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+ )*\s*\)\s*)+|\s+)*)))>(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?- xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\ s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^ \x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*))|(?-xism:(?-x ism:(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^ ()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s* (?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]) )|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*(?-xism:[^\x00-\x1F\x7F() <>\[\]:;@\,.\s]+(?:\.[^\x00-\x1F\x7F()<>\[\]:;@\,.\s ]+)*)(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+) )|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism :(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\ s*\)\s*))+)*\s*\)\s*)+|\s+)*)|(?-xism:(?-xism:(?-xism:\s*\(( ?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\ x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-x ism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+) *(?-xism:(?-xism:[^\\])|(?-xism:\\(?-xism:[^\x0A\x0D ])))+(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\ \]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?- xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|) +)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*))\@(?-xism:(?-xism:(?-xism: (?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\( ?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[ ^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\ s*\)\s*)+|\s+)*(?-xism:[^\x00-\x1F\x7F()<>\[\]:;@\,.\s]+ (?:\.[^\x00-\x1F\x7F()<>\[\]:;@\,.\s]+)*)(?-xism:(?-xism :\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism: [^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+ ))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s* )+|\s+)*)|(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism :(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\( (?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A \x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*\[(?:\s*(?-xism:(?- xism:[^\[\]\\])|(?-xism:\\(?-xism:[^\x0A\x0D])))+)*\s*\](?-x ism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism :\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism: (?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*)) +)*\s*\)\s*)+|\s+)*))))(?-xism:\s*\((?:\s*(?-xism:(?-xism:(? >[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?: \s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0 D]))|)+)*\s*\)\s*))+)*\s*\)\s*)*)" .Replace("", "\"") .Replace("\t", "") .Replace(" ", "") .Replace("\r", "") .Replace("\n", ""); private static Regex mailbox = new Regex(gibberish, RegexOptions.ExplicitCapture); 

    Here’s the PHP I use. I’ve choosen this solution in the spirit of “false positives are better than false negatives” as declared by another commenter here AND with regards to keeping your response time up and server load down … there’s really no need to waste server resources with a regular expression when this will weed out most simple user error. You can always follow this up by sending a test email if you want.

     function validateEmail($email) { return (bool) stripos($email,'@'); } 

    According to official standard RFC 2822 valid email regex is

     (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\]) 

    if you want to use it in Java its really very easy

     import java.util.regex.*; class regexSample { public static void main(String args[]) { //Input the string for validation String email = "[email protected]"; //Set the email pattern string Pattern p = Pattern.compile(" (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|" +"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\")" + "@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\]"); //Match the given string with the pattern Matcher m = p.matcher(email); //check whether match is found boolean matchFound = m.matches(); if (matchFound) System.out.println("Valid Email Id."); else System.out.println("Invalid Email Id."); } } 

    RFC 5322 standard:

    Allows dot-atom local-part, quoted-string local-part, obsolete (mixed dot-atom and quoted-string) local-part, domain name domain, (IPv4, IPv6, and IPv4-mapped IPv6 address) domain literal domain, and (nested) CFWS.

     '/^(?!(?>(?1)"?(?>\\\[ -~]|[^"])"?(?1)){255,})(?!(?>(?1)"?(?>\\\[ -~]|[^"])"?(?1)){65,}@)((?>(?>(?>((?>(?>(?>\x0D\x0A)?[\t ])+|(?>[\t ]*\x0D\x0A)?[\t ]+)?)(\((?>(?2)(?>[\x01-\x08\x0B\x0C\x0E-\'*-\[\]-\x7F]|\\\[\x00-\x7F]|(?3)))*(?2)\)))+(?2))|(?2))?)([!#-\'*+\/-9=?^-~-]+|"(?>(?2)(?>[\x01-\x08\x0B\x0C\x0E-!#-\[\]-\x7F]|\\\[\x00-\x7F]))*(?2)")(?>(?1)\.(?1)(?4))*(?1)@(?!(?1)[a-z0-9-]{64,})(?1)(?>([a-z0-9](?>[a-z0-9-]*[a-z0-9])?)(?>(?1)\.(?!(?1)[a-z0-9-]{64,})(?1)(?5)){0,126}|\[(?:(?>IPv6:(?>([a-f0-9]{1,4})(?>:(?6)){7}|(?!(?:.*[a-f0-9][:\]]){8,})((?6)(?>:(?6)){0,6})?::(?7)?))|(?>(?>IPv6:(?>(?6)(?>:(?6)){5}:|(?!(?:.*[a-f0-9]:){6,})(?8)?::(?>((?6)(?>:(?6)){0,4}):)?))?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?9)){3}))\])(?1)$/isD' 

    RFC 5321 standard:

    Allows dot-atom local-part, quoted-string local-part, domain name domain, and (IPv4, IPv6, and IPv4-mapped IPv6 address) domain literal domain.

     '/^(?!(?>"?(?>\\\[ -~]|[^"])"?){255,})(?!"?(?>\\\[ -~]|[^"]){65,}"[email protected])(?>([!#-\'*+\/-9=?^-~-]+)(?>\.(?1))*|"(?>[ !#-\[\]-~]|\\\[ -~])*")@(?!.*[^.]{64,})(?>([a-z0-9](?>[a-z0-9-]*[a-z0-9])?)(?>\.(?2)){0,126}|\[(?:(?>IPv6:(?>([a-f0-9]{1,4})(?>:(?3)){7}|(?!(?:.*[a-f0-9][:\]]){8,})((?3)(?>:(?3)){0,6})?::(?4)?))|(?>(?>IPv6:(?>(?3)(?>:(?3)){5}:|(?!(?:.*[a-f0-9]:){6,})(?5)?::(?>((?3)(?>:(?3)){0,4}):)?))?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?6)){3}))\])$/iD' 

    Basic:

    Allows dot-atom local-part and domain name domain (requiring at least two domain name labels with the TLD limited to 2-6 alphabetic characters).

     "/^(?!.{255,})(?!.{65,}@)([!#-'*+\/-9=?^-~-]+)(?>\.(?1))*@(?!.*[^.]{64,})(?>[a-z0-9](?>[a-z0-9-]*[a-z0-9])?\.){1,126}[az]{2,6}$/iD" 

    Strange that you “cannot” allow 4 characters TLDs. You are banning people from .info and .name , and the length limitation stop .travel and .museum , but yes, they are less common than 2 characters TLDs and 3 characters TLDs.

    You should allow uppercase alphabets too. Email systems will normalize the local part and domain part.

    For your regex of domain part, domain name cannot starts with ‘-‘ and cannot ends with ‘-‘. Dash can only stays in between.

    If you used the PEAR library, check out their mail function (forgot the exact name/library). You can validate email address by calling one function, and it validates the email address according to definition in RFC822.

     public bool ValidateEmail(string sEmail) { if (sEmail == null) { return false; } int nFirstAT = sEmail.IndexOf('@'); int nLastAT = sEmail.LastIndexOf('@'); if ((nFirstAT > 0) && (nLastAT == nFirstAT) && (nFirstAT < (sEmail.Length - 1))) { return (Regex.IsMatch(sEmail, @"^[az|0-9|AZ]*([_][az|0-9|AZ]+)*([.][az|0-9|AZ]+)*([.][az|0-9|AZ]+)*(([_][az|0-9|AZ]+)*)[email protected][az][az|0-9|AZ]*\.([az][az|0-9|AZ]*(\.[az][az|0-9|AZ]*)?)$")); } else { return false; } } 
    Interesting Posts

    Ошибка входа в веб-приложение для пользователя «NT AUTHORITY \ ANONYMOUS LOGON»

    Настроить десятичную точку вместо запятой в вводе номера HTML5 (на стороне клиента)

    Как Hadoop выполняет входные расщепления?

    В чем разница строк в одиночных или двойных кавычках в groovy?

    Что делает этот язык выражения $ {pageContext.request.contextPath} в JSP EL?

    Почему 16 байтов рекомендуемый размер для структуры в C #?

    Улучшить производительность SQLite в секунду в секунду?

    Как добавить изображение в JPanel?

    Как сравнить значения родовых типов?

    java.lang.UnsupportedClassVersionError: неверный номер версии в .class файле?

    Кастинг против ключевого слова «как» в CLR

    Двусторонний / двунаправленный словарь в C #?

    Сериализация с Jackson (JSON) – получение «Серийный анализатор не найден»?

    Почему мой jQuery: not () не работает в CSS?

    mylib.so имеет текстовые перестановки. Это напрасно тратит память и представляет угрозу безопасности. Пожалуйста исправьте

    Давайте будем гением компьютера.