色偷偷偷久久伊人大杳蕉,色爽交视频免费观看,欧美扒开腿做爽爽爽a片,欧美孕交alscan巨交xxx,日日碰狠狠躁久久躁蜜桃

<code id="8u0wy"><legend id="8u0wy"><cite id="8u0wy"></cite></legend></code>

<pre id="8u0wy"></pre>

<style id="8u0wy"></style>

<bdo id="8u0wy"></bdo>

x

x

發(fā)新帖

查看: 5350|回復(fù): 10

上一主題

下一主題

如何看懂man page？（轉(zhuǎn)）

[復(fù)制鏈接]

電梯直達(dá)

跳轉(zhuǎn)到指定樓層

樓主

發(fā)表于 2009-6-30 09:28:07 | 只看該作者回帖獎(jiǎng)勵(lì)

回帖獎(jiǎng)勵(lì)

|倒序?yàn)g覽 |閱讀模式

貿(mào)澤電子有獎(jiǎng)問答視頻，回答正確發(fā)放10元微信紅包

關(guān)鍵詞： man , page

看懂man page是做 Linux開發(fā)最基本的要求，然而很多新手非常不喜歡看man page，我們?cè)诮?br /> 學(xué)中發(fā)現(xiàn)，雖然從第一天講編程就開始強(qiáng)調(diào)一定要看man page，rtfm=read the f*cking
manual，但結(jié)果是很多學(xué)生都想方設(shè)法繞過看man page，一個(gè)月以后，從沒來仔細(xì)看過一個(gè)
man page的學(xué)生仍然有半數(shù)以上。

比如有一本《Linux常用C函數(shù)（中文版）》就是學(xué)生們的最愛，雖然我們從來沒有推薦過也
沒有提供過這本書的電子版或印刷版，但是學(xué)生幾乎人手一份。這本書的風(fēng)格和man page截
然不同，函數(shù)接口的說明非常簡(jiǎn)略，遠(yuǎn)遠(yuǎn)沒有涵蓋man page的要點(diǎn)，然而每個(gè)函數(shù)后面都不
厭其煩地舉一個(gè)例子，即使這個(gè)函數(shù)的用法已經(jīng)像禿頭上的虱子那么明顯了也要舉個(gè)例子，
而且通常這個(gè)例子寫得極不規(guī)范，例如從來不判斷出錯(cuò)返回值。讓我說，這本書就是垃圾，
這本書的存在不僅浪費(fèi)空間，而且害人不淺。適合新手速查是沒有錯(cuò)，但人都是有惰性的，
新手往往都會(huì)依賴上這本書，不用去看man page，也不想去看，看man page干嗎？東拉西扯
說了那么多，費(fèi)半天勁也看不懂，而且最后連個(gè)例子都沒有，看完還是不知道怎么調(diào)用這個(gè)
函數(shù)，哪有看這本書學(xué)得輕松，連字都不用看，直接把例子粘貼到自己的代碼中就行了。

新手就這樣被毒害了：第一，剛才說了，這些例子極不規(guī)范，bug很多，就是垃圾代碼，誰用
了它誰的代碼也就成了垃圾代碼；第二，說明得太簡(jiǎn)略，容易讓人產(chǎn)生片面理解和誤解。第
三，助長(zhǎng)了新手的惰性，雖然靠這本書能寫出很多程序，但英文能力、理解能力和技術(shù)水平
都長(zhǎng)期停滯不前，根本不能算是學(xué)習(xí)提高了；第四，這本書畢竟只介紹了數(shù)量有限的C函數(shù)，
實(shí)際工作中當(dāng)然會(huì)用到很多書上沒有的函數(shù)，本來看看man page就會(huì)用了，但是新手們已經(jīng)
離不開這本書了，必然會(huì)想一些湊合應(yīng)付的辦法，用書上有的函數(shù)代替書上沒有的函數(shù)去應(yīng)
付工作。就這樣，這本masterpiece培養(yǎng)出了一大批合格的垃圾代碼制造者。

還有一本《Linux C函數(shù)庫(kù)詳解詞典》也是這一類書的典型代表，和上面說的那本大同小異。
扯點(diǎn)離題的話，我有一個(gè)更極端的觀點(diǎn)：任何給程序員看的文檔都不應(yīng)該翻譯成中文，因?yàn)?br /> 不具備流暢地閱讀英文的能力就不是一個(gè)合格的程序員，應(yīng)該先去學(xué)好英文再學(xué)編程，更何
況翻譯總會(huì)引入新的錯(cuò)誤和不準(zhǔn)確，使文檔的質(zhì)量下降。只有給用戶看的文檔才應(yīng)該翻譯成
中文，因?yàn)椴荒芤笥脩暨_(dá)到多高的水平才可以使用這個(gè)軟件。

把難理解的、難掌握的都回避了，把本來很復(fù)雜的man page閹割了之后再去教給新手，讓他
們以為掌握技術(shù)就是這么簡(jiǎn)單，一書在手，萬事不愁，這根本不算是教育。真正的教育不應(yīng)
該回避任何復(fù)雜性，而應(yīng)該是舉一反三，把一個(gè)復(fù)雜的問題給學(xué)生分析透了，然后啟發(fā)學(xué)生
自己去解決其它的復(fù)雜問題。下面我來仔細(xì)剖析一個(gè)man page，通過這一個(gè)例子說明man
page的行文中存在的普遍規(guī)律，說明應(yīng)該如何理解一個(gè)man page，以達(dá)到舉一反三的目的，
我相信我這一篇文章比以上兩本爛書對(duì)新手更為有用。

這是POSIX規(guī)范中正則表達(dá)式的C函數(shù)的man page，讀者要用這些函數(shù)首先要對(duì)正則表達(dá)式的
概念非常清晰，知道正則表達(dá)式能用來干什么，不能用來干什么，要干的話怎么干，并且能
夠很熟練地寫出正則表達(dá)式來，每個(gè)man page都是高度cohesive的，不會(huì)教你這些偏離主題
的東西。也就是說，首先你期望要用這些函數(shù)完成什么工作必須非常清楚，如果自己都不知
道自己要干什么，man page是幫不了你的。

收藏0 頂0 踩0

回復(fù)

沙發(fā)

樓主| 發(fā)表于 2009-6-30 09:28:36 | 只看該作者

1. REGEX(3)                Linux Programmer’s Manual                REGEX(3)
2.
3. NAME
4.       regcomp, regexec, regerror, regfree - POSIX regex functions
5.
6. SYNOPSIS
7.       #include
8.       #include
9.
  10.       int regcomp(regex_t *preg, const char *regex, int cflags);
  11.
  12.       int regexec(const regex_t *preg, const char *string, size_t nmatch,
  13.                   regmatch_t pmatch[], int eflags);
  14.
  15.       size_t regerror(int errcode, const regex_t *preg, char *errbuf,
  16.                      size_t errbuf_size);
  17.
  18.       void regfree(regex_t *preg);

回復(fù) 支持反對(duì)

板凳

樓主| 發(fā)表于 2009-6-30 09:30:39 | 只看該作者

這個(gè)man page描述了四個(gè)函數(shù)的用法。本來我只是想用一個(gè)正則表達(dá)式匹配一個(gè)字符串，并取得
匹配結(jié)果，也就是說我想要的是這樣一個(gè)函數(shù)：

C代碼

1. int my_expect_func(傳入：正則表達(dá)式, 傳入：目標(biāo)字符串, 傳出：匹配結(jié)果);
2. 返回：錯(cuò)誤碼

int my_expect_func(傳入：正則表達(dá)式, 傳入：目標(biāo)字符串, 傳出：匹配結(jié)果);
返回：錯(cuò)誤碼

怎么會(huì)有四個(gè)函數(shù)呢？哪個(gè)跟我想要的函數(shù)最相關(guān)？其它函數(shù)又是做什么的？這是一個(gè)好的
閱讀習(xí)慣：你要主動(dòng)去猜測(cè)，而不是被動(dòng)地接受信息。理解的過程應(yīng)該是拿你的猜測(cè)
去和文字描述相比較，如果相符就說明理解對(duì)了，如果不符就要提出一個(gè)新的猜測(cè)去比較，
完全被動(dòng)地接受信息那不叫理解。

傳入?yún)?shù)和傳出參數(shù)是一個(gè)重要的提示，Linux的庫(kù)函數(shù)原型都是非常規(guī)范的，const指針一
定是傳入?yún)?shù)，非const指針一定有傳出值（可能是傳出參數(shù)，也可能是傳入-傳出參數(shù)），
所以，函數(shù)原型就已經(jīng)非常清楚地告訴你應(yīng)該怎么調(diào)用這個(gè)函數(shù)了，根本沒必要給出代碼例
子�？吹谝粋€(gè)函數(shù)：

C代碼

1. int regcomp(regex_t *preg, const char *regex, int cflags);

   int regcomp(regex_t *preg, const char *regex, int cflags);

preg是傳出參數(shù)，需要事先分配該對(duì)象的內(nèi)存然后把地址傳給regcomp函數(shù)，regex是傳入?yún)?br /> 數(shù)，cflags是標(biāo)志位，preg不知道是什么，但regex就是regular expression，正則表達(dá)式，
又是char *型的，應(yīng)該沒錯(cuò)了，不用看下面的說明就可以猜測(cè)這個(gè)函數(shù)是這樣調(diào)用的：

C代碼

1. regex_t regobj;
2. regcomp(®obj, "正則表達(dá)式", 標(biāo)志位1|標(biāo)志位2|...);

regex_t regobj;
regcomp(®obj, "正則表達(dá)式", 標(biāo)志位1|標(biāo)志位2|...);

再?gòu)?qiáng)調(diào)一遍，要想理解一段文字，就要充分調(diào)動(dòng)經(jīng)驗(yàn)和推理，主動(dòng)去猜測(cè)，然后看下文驗(yàn)證
你的猜測(cè)，而不是被動(dòng)接受信息。怎么推理呢？以上函數(shù)傳入一個(gè)正則表達(dá)式，指定幾個(gè)標(biāo)
志，傳出一個(gè)值，應(yīng)該是把正則表達(dá)式轉(zhuǎn)換格式了吧？這就叫推理。相反，如果我根本不管
preg是一個(gè)傳出參數(shù)，而且也不是字符串型的，非要往my_expect_func的形式上套，既然
regex參數(shù)是正則表達(dá)式，那么preg參數(shù)就應(yīng)該是目標(biāo)字符串，這就不叫推理和猜測(cè)，叫瞎蒙。

回復(fù) 支持反對(duì)

地板

樓主| 發(fā)表于 2009-6-30 09:32:22 | 只看該作者

如果對(duì)正則表達(dá)式的機(jī)理有一定了解，就可以借助這個(gè)經(jīng)驗(yàn)猜到這個(gè)函數(shù)大概是把正則表達(dá)
式字符串轉(zhuǎn)換成狀態(tài)機(jī)以便高效地匹配目標(biāo)字符串。如果以前用過其它編程語言的正則表達(dá)
式庫(kù)函數(shù)，也可以借助這些經(jīng)驗(yàn)知道正則表達(dá)式在使用之前大多有一個(gè)預(yù)處理的步驟。另
外，對(duì)英文縮寫要有一定敏感性，函數(shù)名是regcomp，reg就是正則表達(dá)式，comp是compare還
是compile？如果是compare，那應(yīng)該有兩個(gè)相同類型的參數(shù)來做比較，就像strcmp，這里顯
然是compile，編譯，把字符串形式轉(zhuǎn)為二進(jìn)制形式，從另一個(gè)側(cè)面也驗(yàn)證了前面的猜測(cè)。這
些都是靠經(jīng)驗(yàn)而不是推理得到的，經(jīng)驗(yàn)有助于更快更準(zhǔn)確地理解，但不是必須的，因?yàn)槭聦?shí)
上我們通過上面基于傳入傳出參數(shù)的推理已經(jīng)猜出正確結(jié)論了，只不過有經(jīng)驗(yàn)的人會(huì)對(duì)自己
的猜測(cè)更自信。

對(duì)英文縮寫敏感是看man page和看代碼需要具備的最基本的能力，但這需要長(zhǎng)期的練習(xí)才能
找到感覺。也許你要學(xué)會(huì)一個(gè)函數(shù)怎么用并不必知道函數(shù)名和各個(gè)參數(shù)名是什么的縮寫，你
通過以上列舉的兩本爛書就可以學(xué)會(huì)怎么用，但如果總是回避man page，總是不去做猜縮寫
的練習(xí)，就不可能看懂別人的代碼，不看別人的代碼就自己亂寫代碼，連變量名該怎么起都
不知道，寫出來的永遠(yuǎn)是垃圾代碼。對(duì)于regcomp這個(gè)函數(shù)名以及各參數(shù)名，regex是
regular expression，regcomp是regular expression compile。那么preg是什么？reg是
regular expression，p表示什么呢？表示指針？那是微軟的infamous的hungarian
notation，Linux上肯定不是這么用的，這里的p我猜是precompiled。cflags的c是什么？不
知道，但是跟下面一個(gè)函數(shù)對(duì)比來看：

C代碼

1. int regexec(const regex_t *preg, const char *string, size_t nmatch,
2.          regmatch_t pmatch[], int eflags);

   int regexec(const regex_t *preg, const char *string, size_t nmatch,
               regmatch_t pmatch[], int eflags);

這個(gè)函數(shù)有個(gè)參數(shù)叫eflags。所以c是regcomp的c，而e是regexec的e，一個(gè)是編譯時(shí)的
flags，一個(gè)是執(zhí)行時(shí)的flags，這兩種flags的取值必然不同，下文必然會(huì)分別說明。這又是
一種猜測(cè)：猜測(cè)下文的行文邏輯。這種猜測(cè)同樣是非常有助于理解的。后面幾個(gè)函數(shù)的函數(shù)
名和參數(shù)名是怎么縮寫的，留給讀者自己練習(xí)。

preg參數(shù)在regcomp中是傳出參數(shù)，在regexec中卻是傳入?yún)?shù)，根據(jù)推理，preg是由
regcomp函數(shù)填寫好之后傳給regexec函數(shù)用的，也就是說正則表達(dá)式以轉(zhuǎn)換之后的二進(jìn)制格
式傳給regexec函數(shù)來用。regexec又有一個(gè)字符串傳入?yún)?shù)string，還有兩個(gè)match參數(shù)表示
匹配結(jié)果，pmatch是傳出參數(shù)，表示緩沖區(qū)首地址，nmatch表示緩沖區(qū)長(zhǎng)度（根據(jù)經(jīng)驗(yàn)，這
類似于strncpy），這必然就是我一開始想要的my_expect_func了：

C代碼

1. int my_expect_func(傳入：正則表達(dá)式, 傳入：目標(biāo)字符串, 傳出：匹配結(jié)果);
2. 返回：錯(cuò)誤碼

int my_expect_func(傳入：正則表達(dá)式, 傳入：目標(biāo)字符串, 傳出：匹配結(jié)果);
返回：錯(cuò)誤碼

preg對(duì)應(yīng)正則表達(dá)式，pmatch和nmatch對(duì)應(yīng)匹配結(jié)果，因此string這個(gè)傳入?yún)?shù)必然是目標(biāo)
字符串了。pmatch是一個(gè)指針變量，但是寫成pmatch[]，說明它指向的是一組而不是一個(gè)
regmatch_t類型的對(duì)象，這一組有多少個(gè)呢？用nmatch參數(shù)表示。和strncpy類似，這一組
regmatch_t對(duì)象應(yīng)該由我們事先分配好再傳給函數(shù)。因此這兩個(gè)函數(shù)應(yīng)該是這樣調(diào)用的：

C代碼

1. regex_t regobj;
2. regcomp(®obj, "正則表達(dá)式", 標(biāo)志位1|標(biāo)志位2|...);
3. regmatch_t matchbuf[10];
4. regexec(®obj, "目標(biāo)字符串", 10, matchbuf, 標(biāo)志位1|標(biāo)志位2|...);

regex_t regobj;
regcomp(®obj, "正則表達(dá)式", 標(biāo)志位1|標(biāo)志位2|...);
regmatch_t matchbuf[10];
regexec(®obj, "目標(biāo)字符串", 10, matchbuf, 標(biāo)志位1|標(biāo)志位2|...);

regmatch_t對(duì)象如何表示一個(gè)匹配呢？如果一個(gè)正則表達(dá)式模式在一個(gè)目標(biāo)字符串中有五次
出現(xiàn)，如何表示這五次出現(xiàn)呢？可以猜測(cè)這個(gè)regmatch_t結(jié)構(gòu)體一定包含了在目標(biāo)字符串中
的匹配位置信息。另外，我傳進(jìn)去10個(gè)regmatch_t對(duì)象，如果只有五次匹配，函數(shù)返回后我
怎么知道前面五個(gè)對(duì)象是有效的匹配信息而后面是無效的呢？是不是通過一個(gè)參數(shù)或返回值
表示匹配次數(shù)的？該函數(shù)并沒有額外的參數(shù)，而且快速翻看一下man page的RETURN
VALUE節(jié)，這個(gè)函數(shù)返回值是錯(cuò)誤碼，也不表示匹配次數(shù)。那這個(gè)函數(shù)一定會(huì)在后面無效的
regmatch_t對(duì)象里填充一個(gè)特殊值，這就是推理，這個(gè)猜測(cè)將會(huì)在閱讀后面的文字時(shí)證實(shí)或
證偽，不管猜得對(duì)不對(duì)，一定會(huì)在后面得到答案。

回復(fù) 支持反對(duì)

地下室

樓主| 發(fā)表于 2009-6-30 09:32:53 | 只看該作者

后面還有兩個(gè)函數(shù)：

C代碼

1. size_t regerror(int errcode, const regex_t *preg, char *errbuf,
2.                size_t errbuf_size);
3.
4. void regfree(regex_t *preg);

   size_t regerror(int errcode, const regex_t *preg, char *errbuf,
                     size_t errbuf_size);

   void regfree(regex_t *preg);

根據(jù)以往的經(jīng)驗(yàn)regerror相當(dāng)于perror或者strerror，將錯(cuò)誤碼翻譯成一個(gè)可讀性好的字符
串，regfree相當(dāng)于free，用來釋放preg。但是preg不是我們自己事先分配的對(duì)象么？既然不
是由這一組函數(shù)動(dòng)態(tài)分配的，為什么需要用這一組函數(shù)來free？由這個(gè)問題引出一個(gè)新的猜
測(cè)，regex_t這種結(jié)構(gòu)體中一定有指針類型的成員，regcomp函數(shù)一定是動(dòng)態(tài)分配了一塊內(nèi)存
然后讓preg中的指針成員指向該內(nèi)存，所以需要用regfree來釋放一下，后者循著preg參數(shù)找
到它的指針成員，然后釋放先前分配的內(nèi)存。再結(jié)合經(jīng)驗(yàn)，正則表達(dá)式的長(zhǎng)短不同，復(fù)雜程
度肯定不同，如果用狀態(tài)機(jī)表示那么需要的狀態(tài)數(shù)量肯定不同，不可能所有正則表達(dá)式的二
進(jìn)制表示都用sizeof(regex_t)這么大就夠用，必然需要?jiǎng)討B(tài)分配內(nèi)存。這種推理和猜測(cè)不僅
有助于解決如何使用函數(shù)的問題，而且對(duì)于這些函數(shù)的實(shí)現(xiàn)機(jī)制也獲得了一些insight，這種
能力對(duì)于讀代碼尤其重要。注意，釋放內(nèi)存的函數(shù)雖然是傳入?yún)?shù)的，不傳出任何有意義的
值，但是函數(shù)原型中的參數(shù)不使用const修飾，因?yàn)獒尫艃?nèi)存也是一種修改。

剛把SYNOPSIS看完，還沒有看下面的說明，就已經(jīng)差不多會(huì)用這些函數(shù)了，靠的是什么？1、
推理 2、經(jīng)驗(yàn) 3、對(duì)英文縮寫敏感。下面一邊看說明，一邊驗(yàn)證以上猜測(cè)。

C代碼

1. DESCRIPTION
2. POSIX Regex Compiling
3.       regcomp()  is  used to compile a regular expression into a form that is
4.       suitable for subsequent regexec() searches.

DESCRIPTION
POSIX Regex Compiling
   regcomp()  is  used to compile a regular expression into a form that is
   suitable for subsequent regexec() searches.

沒錯(cuò)，regcomp確實(shí)是用來把正則表達(dá)式轉(zhuǎn)換成一種二進(jìn)制格式以適合subsequent的
regexec()處理。這個(gè)subsequent就說明先調(diào)用regcomp再調(diào)用regexec。理解文檔的時(shí)候，表
示概念的文字和表示概念之間關(guān)系的文字是最重要的。像man page這種簡(jiǎn)潔的文檔中，表示
概念之間關(guān)系的文字尤其容易被忽視，因?yàn)椴幌裣露x那么明顯，往往一個(gè)詞就帶過。作為
練習(xí)，請(qǐng)讀者注意后面的文字中有哪些表示概念之間關(guān)系的詞。

回復(fù) 支持反對(duì)

6樓

樓主| 發(fā)表于 2009-6-30 09:33:36 | 只看該作者

C代碼

1. regcomp() is supplied with preg, a pointer to a pattern buffer  storage
2. area;  regex, a pointer to the null-terminated string and cflags, flags
3. used to determine the type of compilation.
4.
5. All regular expression searching must be done via  a  compiled  pattern
6. buffer,  thus  regexec()  must always be supplied with the address of a
7. regcomp() initialized pattern buffer.

   regcomp() is supplied with preg, a pointer to a pattern buffer  storage
   area;  regex, a pointer to the null-terminated string and cflags, flags
   used to determine the type of compilation.

   All regular expression searching must be done via  a  compiled  pattern
   buffer,  thus  regexec()  must always be supplied with the address of a
   regcomp() initialized pattern buffer.

preg, a pointer to a pattern buffer storage area就說明preg這個(gè)對(duì)象的空間是需要我
們自己分配的，分配完了再傳一個(gè)地址也就是preg給regcomp。man page不會(huì)直接說你應(yīng)該自
己分配了空間再傳給我，這么說也太貳了。但你要自己體會(huì)出它真正想傳達(dá)給你的信息。

C代碼

1.    cflags may be the bitwise-or of one or more of the following:
2.
3.    REG_EXTENDED
4.          Use POSIX Extended Regular Expression syntax  when  interpreting
5.          regex. If  not  set,  POSIX Basic Regular Expression syntax is
6.          used.
7.
8.    REG_ICASE
9.          Do not differentiate case.  Subsequent regexec() searches  using
  10.          this pattern buffer will be case insensitive.
  11.
  12.    REG_NOSUB
  13.          Support  for  substring  addressing  of matches is not required.
  14.          The nmatch and pmatch parameters to regexec() are ignored if the
  15.          pattern buffer supplied was compiled with this flag set.
  16.
  17.    REG_NEWLINE
  18.          Match-any-character operators don’t match a newline.
  19.
  20.          A  non-matching list ([^...])  not containing a newline does not
  21.          match a newline.
  22.
  23.          Match-beginning-of-line operator (^) matches  the  empty  string
  24.          immediately  after  a newline, regardless of whether eflags, the
  25.          execution flags of regexec(), contains REG_NOTBOL.
  26.
  27.          Match-end-of-line operator ($) matches the empty string  immedi‐
  28.          ately  before  a  newline, regardless of whether eflags contains
  29.          REG_NOTEOL.
  30.
  31. POSIX Regex Matching
  32.    regexec() is used to match a null-terminated string against the precom‐
  33.    piled  pattern  buffer,  preg. nmatch  and pmatch are used to provide
  34.    information regarding the location of any matches.  eflags may  be  the
  35.    bitwise-or  of  one  or  both  of REG_NOTBOL and REG_NOTEOL which cause
  36.    changes in matching behavior described below.
  37.
  38.    REG_NOTBOL
  39.          The match-beginning-of-line operator always fails to match  (but
  40.          see  the  compilation  flag  REG_NEWLINE above) This flag may be
  41.          used when different portions of a string are passed to regexec()
  42.          and the beginning of the string should not be interpreted as the
  43.          beginning of the line.
  44.
  45.    REG_NOTEOL
  46.          The match-end-of-line operator always fails to  match  (but  see
  47.          the compilation flag REG_NEWLINE above)

   cflags may be the bitwise-or of one or more of the following:

   REG_EXTENDED
            Use POSIX Extended Regular Expression syntax  when  interpreting
            regex. If  not  set,  POSIX Basic Regular Expression syntax is
            used.

   REG_ICASE
            Do not differentiate case.  Subsequent regexec() searches  using
            this pattern buffer will be case insensitive.

   REG_NOSUB
            Support  for  substring  addressing  of matches is not required.
            The nmatch and pmatch parameters to regexec() are ignored if the
            pattern buffer supplied was compiled with this flag set.

   REG_NEWLINE
            Match-any-character operators don’t match a newline.

            A  non-matching list ([^...])  not containing a newline does not
            match a newline.

            Match-beginning-of-line operator (^) matches  the  empty  string
            immediately  after  a newline, regardless of whether eflags, the
            execution flags of regexec(), contains REG_NOTBOL.

            Match-end-of-line operator ($) matches the empty string  immedi‐
            ately  before  a  newline, regardless of whether eflags contains
            REG_NOTEOL.

POSIX Regex Matching
   regexec() is used to match a null-terminated string against the precom‐
   piled  pattern  buffer,  preg. nmatch  and pmatch are used to provide
   information regarding the location of any matches.  eflags may  be  the
   bitwise-or  of  one  or  both  of REG_NOTBOL and REG_NOTEOL which cause
   changes in matching behavior described below.

   REG_NOTBOL
            The match-beginning-of-line operator always fails to match  (but
            see  the  compilation  flag  REG_NEWLINE above) This flag may be
            used when different portions of a string are passed to regexec()
            and the beginning of the string should not be interpreted as the
            beginning of the line.

   REG_NOTEOL
            The match-end-of-line operator always fails to  match  (but  see
            the compilation flag REG_NEWLINE above)

前面猜測(cè)過了，cflags和eflags既然不叫同一個(gè)名字，肯定分別有不同的取值，并且通常這
些取值都是bitwise-or起來用的。本文重點(diǎn)在于講如何閱讀理解man page，而不在于講具體
的技術(shù)，所以這些標(biāo)志都起什么作用不詳細(xì)解釋了。但是再做幾個(gè)猜縮寫的練習(xí)，這不僅有
助于理解，更有助于記憶這些標(biāo)志，有些常用的標(biāo)志把它記住了就不必每次用都查手冊(cè)了。
REG_ICASE，ICASE表示ignore case，這種縮寫很常見。REG_NOSUB，SUB有些時(shí)候表示
substitute，有些時(shí)候表示substring，在這里就表示substring。REG_NOTBOL，初看不知道
BOL是什么，看是再看和它對(duì)稱的REG_NOTEOL，根據(jù)經(jīng)驗(yàn)，我們已經(jīng)知道EOF是end of file，
那么這個(gè)EOL應(yīng)該是end of line，那么相對(duì)地BOL就應(yīng)該是beginning of line。

回復(fù) 支持反對(duì)

7樓

樓主| 發(fā)表于 2009-6-30 09:36:54 | 只看該作者

C代碼

1. BYTE OFFSETS
2.    Unless  REG_NOSUB was set for the compilation of the pattern buffer, it
3.    is possible to obtain substring match addressing  information. pmatch
4.    must be dimensioned to have at least nmatch elements.  These are filled
5.    in by regexec() with substring match addresses.  Any  unused  structure
6.    elements will contain the value -1.
7.
8.    The  regmatch_t  structure  which  is  the type of pmatch is defined in
9.    .
  10.
  11.       typedef struct {
  12.          regoff_t rm_so;
  13.          regoff_t rm_eo;
  14.       } regmatch_t;
  15.
  16.    Each rm_so element that is not -1 indicates the  start  offset  of  the
  17.    next  largest  substring  match  within the string.  The relative rm_eo
  18.    element indicates the end offset of the match.

BYTE OFFSETS
   Unless  REG_NOSUB was set for the compilation of the pattern buffer, it
   is possible to obtain substring match addressing  information. pmatch
   must be dimensioned to have at least nmatch elements.  These are filled
   in by regexec() with substring match addresses.  Any  unused  structure
   elements will contain the value -1.

   The  regmatch_t  structure  which  is  the type of pmatch is defined in
   .

         typedef struct {
            regoff_t rm_so;
            regoff_t rm_eo;
         } regmatch_t;

   Each rm_so element that is not -1 indicates the  start  offset  of  the
   next  largest  substring  match  within the string.  The relative rm_eo
   element indicates the end offset of the match.

沒錯(cuò)，先前我們猜測(cè)，regmatch_t對(duì)象表示匹配的位置信息，從regexec函數(shù)返回后，那組
regmatch_t對(duì)象后面無效的部分一定是用一個(gè)特殊值來表示無效，這個(gè)特殊值就是-1。匹配
位置信息包括起始位置和結(jié)束位置，再一猜就知道，rm_so表示regmatch start
offset，rm_eo表示regmatch end offset，要有這樣的敏感性，rm_so和rm_eo，別的字母都
一樣，就s和e不一樣，表示相對(duì)概念的s和e就是start和end，這在程序代碼中很常見。還有
一個(gè)很常見的現(xiàn)象是結(jié)構(gòu)體成員名字有一個(gè)前綴是結(jié)構(gòu)體名字的縮寫，比如這里的rm_表示
regmatch。

C代碼

1. Posix Error Reporting
2.    regerror() is used to turn the error codes that can be returned by both
3.    regcomp() and regexec() into error message strings.
4.
5.    regerror() is passed the error code, errcode, the pattern buffer, preg,
6.    a pointer to a character string buffer, errbuf, and  the  size  of  the
7.    string buffer, errbuf_size.  It returns the size of the errbuf required
8.    to contain the null-terminated error message string. If  both  errbuf
9.    and  errbuf_size  are  nonzero,  errbuf  is  filled  in  with the first
  10.    errbuf_size - 1 characters of the error message and a terminating null.
  11.
  12. POSIX Pattern Buffer Freeing
  13.    Supplying  regfree()  with a precompiled pattern buffer, preg will free
  14.    the memory allocated to the pattern buffer by  the  compiling  process,
  15.    regcomp().

Posix Error Reporting
   regerror() is used to turn the error codes that can be returned by both
   regcomp() and regexec() into error message strings.

   regerror() is passed the error code, errcode, the pattern buffer, preg,
   a pointer to a character string buffer, errbuf, and  the  size  of  the
   string buffer, errbuf_size.  It returns the size of the errbuf required
   to contain the null-terminated error message string. If  both  errbuf
   and  errbuf_size  are  nonzero,  errbuf  is  filled  in  with the first
   errbuf_size - 1 characters of the error message and a terminating null.

POSIX Pattern Buffer Freeing
   Supplying  regfree()  with a precompiled pattern buffer, preg will free
   the memory allocated to the pattern buffer by  the  compiling  process,
   regcomp().

這也和先前猜測(cè)的一致。regerror是把錯(cuò)誤碼翻譯成可讀性好的字符串。regfree是把preg對(duì)
象中分配的內(nèi)存釋放掉。

回復(fù) 支持反對(duì)

8樓

樓主| 發(fā)表于 2009-6-30 09:37:16 | 只看該作者

C代碼

1. RETURN VALUE
2.       regcomp()  returns  zero  for a successful compilation or an error code
3.       for failure.
4.
5.       regexec() returns zero for a successful match or REG_NOMATCH for  fail‐
6.       ure.

RETURN VALUE
   regcomp()  returns  zero  for a successful compilation or an error code
   for failure.

   regexec() returns zero for a successful match or REG_NOMATCH for  fail‐
   ure.

man page為了保持形式上的整齊，把RETURN VALUE單獨(dú)拿出來湊成一節(jié)，這一直讓我覺得很
不舒服。如果在一個(gè)man page里描述了多個(gè)函數(shù)，那么每看完一個(gè)函數(shù)的說明都應(yīng)該跳到這
里來看一下返回值是什么，而不是把其它函數(shù)的說明全部看完了再看這里。事實(shí)上這個(gè)man
page做得也不夠整齊，regerror的返回值就寫在上面的說明文字中而沒有寫在這里。可見把
返回值在最后單列出來很不符合書寫和閱讀習(xí)慣�，F(xiàn)在這樣搞得很不好，有的返回值單列在
后面，有的又寫在說明文字中，看手冊(cè)就得滿世界找返回值在哪兒。我認(rèn)為這是man page的
一大缺點(diǎn)。相反，讓新手很不舒服的是man page太過簡(jiǎn)潔，并且沒有代碼例子，這不是man
page的缺點(diǎn)而應(yīng)該是優(yōu)點(diǎn)。

C代碼

1. ERRORS
2.       The following errors can be returned by regcomp():
3.
4.       REG_BADBR
5.             Invalid use of back reference operator.
6.
7.       REG_BADPAT
8.             Invalid use of pattern operators such as group or list.
9.
  10.       REG_BADRPT
  11.             Invalid  use  of  repetition  operators such as using ’*’ as the
  12.             first character.
  13.
  14.       REG_EBRACE
  15.             Un-matched brace interval operators.
  16.
  17.       REG_EBRACK
  18.             Un-matched bracket list operators.
  19.
  20.       REG_ECOLLATE
  21.             Invalid collating element.
  22.
  23.       REG_ECTYPE
  24.             Unknown character class name.
  25.
  26.       REG_EEND
  27.             Non specific error.  This is not defined by POSIX.2.
  28.
  29.       REG_EESCAPE
  30.             Trailing backslash.
  31.
  32.       REG_EPAREN
  33.             Un-matched parenthesis group operators.
  34.
  35.       REG_ERANGE
  36.             Invalid use of the range operator, e.g., the ending point of the
  37.             range occurs prior to the starting point.
  38.
  39.       REG_ESIZE
  40.             Compiled  regular  expression  requires  a pattern buffer larger
  41.             than 64Kb.  This is not defined by POSIX.2.
  42.
  43.       REG_ESPACE
  44.             The regex routines ran out of memory.
  45.
  46.       REG_ESUBREG
  47.             Invalid back reference to a subexpression.
  48.
  49. CONFORMING TO
  50.       POSIX.1-2001.

ERRORS
   The following errors can be returned by regcomp():

   REG_BADBR
            Invalid use of back reference operator.

   REG_BADPAT
            Invalid use of pattern operators such as group or list.

   REG_BADRPT
            Invalid  use  of  repetition  operators such as using ’*’ as the
            first character.

   REG_EBRACE
            Un-matched brace interval operators.

   REG_EBRACK
            Un-matched bracket list operators.

   REG_ECOLLATE
            Invalid collating element.

   REG_ECTYPE
            Unknown character class name.

   REG_EEND
            Non specific error.  This is not defined by POSIX.2.

   REG_EESCAPE
            Trailing backslash.

   REG_EPAREN
            Un-matched parenthesis group operators.

   REG_ERANGE
            Invalid use of the range operator, e.g., the ending point of the
            range occurs prior to the starting point.

   REG_ESIZE
            Compiled  regular  expression  requires  a pattern buffer larger
            than 64Kb.  This is not defined by POSIX.2.

   REG_ESPACE
            The regex routines ran out of memory.

   REG_ESUBREG
            Invalid back reference to a subexpression.

CONFORMING TO
   POSIX.1-2001.

有個(gè)學(xué)生看完了這一段之后問我，上面說regexec成功返回0失敗返回
REG_NOMATCH，REG_NOMATCH這個(gè)錯(cuò)誤碼表示什么？怎么在ERRORS節(jié)中沒有解釋？這是一個(gè)典
型的沒有理解到位的例子。上面說regcomp成功返回0失敗返回錯(cuò)誤碼，卻沒有說返回哪些錯(cuò)
誤碼，而是詳細(xì)列在ERRORS節(jié)中，regcomp失敗的原因有很多，這些錯(cuò)誤碼大多是描述正則表
達(dá)式的各種語法錯(cuò)誤的。而regexec是判斷匹配不匹配的，匹配就返回0不匹配就返回
REG_NOMATCH，NOMATCH就是no match，這句話本身就說明了這個(gè)錯(cuò)誤碼是什么意思，所以就
沒有在ERRORS節(jié)中再解釋了，這也體現(xiàn)了man page的簡(jiǎn)潔性，一句廢話都沒有。

這個(gè)學(xué)生為什么會(huì)沒有理解到位呢？還是因?yàn)閷?duì)英文不敏感，REG_NOMATCH在他看來就是一串
大寫字母，一個(gè)符號(hào)，而沒看出來是no match，因此覺得這個(gè)符號(hào)必須在后面詳細(xì)解釋，而
沒有想到這個(gè)符號(hào)用在這里是雙關(guān)的，它自己就解釋了自己。

C代碼

1. SEE ALSO
2.       grep(1), regex(7), GNU regex manual
3.
4. COLOPHON
5.       This page is part of release 2.77 of the Linux  man-pages  project. A
6.       description  of  the project, and information about reporting bugs, can
7.       be found at http://www.kernel.org/doc/man-pages/.
8.
9. GNU                            1998-05-08                         REGEX(3)

SEE ALSO
   grep(1), regex(7), GNU regex manual

COLOPHON
   This page is part of release 2.77 of the Linux  man-pages  project. A
   description  of  the project, and information about reporting bugs, can
   be found at http://www.kernel.org/doc/man-pages/.

GNU                            1998-05-08                         REGEX(3)

man page的最后這一段比較有價(jià)值的是SEE ALSO。由于每個(gè)man page都有自己的主題，而不
會(huì)去扯一些離題的話，有時(shí)候就需要把幾個(gè)相關(guān)的man page結(jié)合起來看，從一系列的相關(guān)主
題中把握一個(gè)overview。有的man page有BUGS節(jié)，這也是非常重要的，最典型的是gets(3)，
前面描述了半天這個(gè)函數(shù)是干嗎用的，最后在BUGS節(jié)里面說，Never use gets()，如
果沒看見這一句，前面的都白看。

回復(fù) 支持反對(duì)

9樓

發(fā)表于 2009-6-30 09:43:36 | 只看該作者

俺等會(huì)也搞點(diǎn)英語閱讀材料上來，也是關(guān)于GNU工具的，這些都是電工們吃飯的家當(dāng)。

回復(fù) 支持反對(duì)

10樓

發(fā)表于 2009-6-30 09:51:10 | 只看該作者

幫頂啊~雖然俺看不懂~O(∩_∩)O~

回復(fù) 支持反對(duì)

11樓

樓主| 發(fā)表于 2009-6-30 09:52:54 | 只看該作者

寫帖子的人還 GNU Free Documentation License發(fā)布的 linux 編程書，
雖然有一定的商業(yè)目的，但是還可以看看的，
http://djkings.javaeye.com/blog/218542

回復(fù) 支持反對(duì)

發(fā)新帖

關(guān)于我們 - 服務(wù)條款 - 使用指南 - 站點(diǎn)地圖 - 友情鏈接 - 聯(lián)系我們
電子工程網(wǎng) © 版權(quán)所有京ICP備16069177號(hào) | 京公網(wǎng)安備11010502021702

快速回復(fù) 返回頂部 返回列表

<sub id="1rvrq"></sub>^{<blockquote id="1rvrq"></blockquote>}