• No results found

Empirical training for conditional random fields

N/A
N/A
Protected

Academic year: 2021

Share "Empirical training for conditional random fields"

Copied!
1
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Empirical Training For Conditional Random Fields

Zhemin Zhu z.zhu@utwente.nl

Djoerd Hiemstra D.Hiemstra@utwente.nl

Peter Apers P.M.G.Apers@utwente.nl

Andreas Wombacher a.wombacher@utwente.nl

CTIT Database Group, Drienerlolaan 5, 7500AE, Enschede, The Netherlands

Conditional Random Fields (CRFs) are undirected graphical models which have been widely applied for sequence labelling, e.g. part-of-speech tagging. Train-ing CRFs (Lafferty et al., 2001) can be very expen-sive for large-scale applications (Sutton & McCallum, 2009). The standard training (SD) of CRFs needs to calculate the partition function Zsd(X) which is a

global summation over the whole graph. Piecewise training (PW) (Sutton & McCallum, 2009) speeds up the training process by approximating the partition function with an upper bound. But piecewise training is still not scalable to the variable cardinality. Another option for sequence labelling is directed models such as Maximum Entropy Markov Models (MEMMs) (Mc-Callum et al., 2000) which can be trained efficiently. But they suffer from the label bias problem (Lafferty et al., 2001) which may lead to low accuracy.

In this paper (Zhu et al., 2013), we present a practi-cally scalable training method for CRFs called Empir-ical Training (EP). We show that the standard train-ing with unregularized log likelihood can have many maximum likelihood estimations (MLEs). Empirical training has a unique closed form MLE which can be calculated from the empirical distribution very fast. The MLE of the empirical training is also one MLE of the standard training. So empirical training can be competitive in precision to the standard training and piecewise training. And also we show that empirical training is unaffected by the label bias problem even it is a local normalized model. Experiments on two real-world NLP datasets also show that empirical training reduces the training time from weeks to seconds, and obtains competitive results to the standard and piece-wise training on linear-chain CRFs, especially when training data are insufficient.

Experiment 1. Brown Corpus is used for the Part-of-Speech (POS) tagging experiment. The size of the tag space is 252. There are 32,623 sentences are used for training and 1,000 sentences are used for testing.

Table 1: Part-of-Speech Tagging Accuracy Metric EP SD PW PWPL Accuracy 95.6 95.4 82.9 82.4

Time (s) 3.9 4,571,807 3,791,648 261,021

The method may also suffer from some potential draw-backs. When using large feature vectors the empirical probabilities may become sparse, generalisation from the training data to the test data may be a problem. Also in the experiment we did not try global features. So there is no evidence to show this method works well with global features. Nevertheless, this method is very fast and could be very useful for practitioners who apply CRFs to large scale data sets.

Acknowledgments

This work has been supported by the Dutch national program COMMIT/.

References

Lafferty, J. D., McCallum, A., & Pereira, F. C. N. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. ICML(pp. 282–289).

McCallum, A., Freitag, D., & Pereira, F. C. N. (2000). Maximum entropy markov models for information extraction and segmentation. ICML (pp. 591–598). Sutton, C. A., & McCallum, A. (2009). Piecewise training for structured prediction. Machine Learn-ing, 77, 165–194.

Zhu, Z., Hiemstra, D., Apers, P. M. G., & Wombacher, A. (2013). Closed form maximum likelihood estima-tor of conditional random fields (Technical Report TR-CTIT-13-03). CTIT, University of Twente.

Referenties

GERELATEERDE DOCUMENTEN

Table 16: Annual gas turbine power station cost savings – improved project sustainability 118 Table 17: Impact of the alternative approach to the industrial DSM ESCO model ..... List

2 shows a simplified mapping of data providers and users with a sample of organisations and actors present in the climate services value chain, helping illustrate the fluidity of

As is the case with regard to empirical work in measuring the extent of a balanced approach (i.e. including both strengths and deficits), both from an organisational and

The presented perspectives become gradually more and more decisive for the question “to what extent can web-based learning be sufficiently vicarious for a the continuous

Then, the main actors in this framework will be identified and lastly an analysis will be given of the current social dialogue and collective bargaining

This paper has introduced the P H/P H/1 threshold queue to study the parameters of traffic that influence the shape of the fundamental diagram including the capacity drop in

In this investigation, the effect of a pre-center drill hole and tool material comprising HSS-Mo, HSS-Co, and HSS-Ti-coated tools on the generated cutting force, surface roughness,

Gas power supply Eskom; Integrated Energy Plan 2012; Integrated Resource Plan 2010; International Energy Agency; (Brooks, 2000); World Energy Council. Pumped storage