Questions tagged [natural-language-processing]

Natural language processing (NLP)

Natural language processing studies the algorithmic analysis and production of texts in human languages. It is closely related to computational linguistics. Modern NLP techniques make heavy use of .

199 questions
32
votes
5 answers

Finding interesting anagrams

Say that $a_1a_2\ldots a_n$ and $b_1b_2\ldots b_n$ are two strings of the same length. An anagramming of two strings is a bijective mapping $p:[1\ldots n]\to[1\ldots n]$ such that $a_i = b_{p(i)}$ for each $i$. There might be more than one…
28
votes
9 answers

Are programming languages becoming more like natural languages?

Can we study programming languages in the context of linguistics? Do programming languages evolve naturally in similar ways to natural languages? Although full rationality, and mathematical consistency is essential to programming languages, there…
25
votes
1 answer

Compression of domain names

I am curious as to how one might very compactly compress the domain of an arbitrary IDN hostname (as defined by RFC5890) and suspect this could become an interesting challenge. A Unicode host or domain name (U-label) consists of a string of Unicode…
18
votes
4 answers

Relation and difference between information retrieval and information extraction?

From Wikipedia Information retrieval is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches can be based on metadata or on full-text indexing. From…
Tim
  • 5,035
  • 5
  • 37
  • 71
13
votes
2 answers

Identifying events related to dates in a paragraph

Is there an algorithmic approach to identify that dates given in a paragraph correlate to particular events (phrases) in the paragraph? Example, consider the following paragraph: In June 1970, the great leader took the oath. But it was only after…
check123
  • 530
  • 3
  • 11
10
votes
1 answer

Implementation of Naive Bayes

I am implementing a Naive Bayes algorithm for text categorization with Laplacian smoothing. The problem I am having is that the probability approaches zero because I am multiplying many small fractions. Therefore, the probability eventually yields…
9
votes
2 answers

What are some efficient ways to find the differences between two large corpuses of text that have similar, but differently ordered content?

I have two large files containing paragraphs of English text: The first text is about 200 pages long and has about 10 paragraphs per page (each paragraph is 5 sentences long). The second text contains almost precisely the same paragraphs and text…
8
votes
1 answer

nlp: phonetic edit distance between a word and the closest of a set of words

Let's say someone is using Dragon Dictation, Google Speech, or some other free form dictation software (it will recognize anything they say to the best of its ability). I have some reasonably large set of words and I'm certain the speaker is trying…
7
votes
2 answers

Are syntax and semantic just 2 structures such that one is a model of the other?

The syntax of a language is a structure. The semantic of a language is a structure. The semantic of a language is a model of its syntax. And that's all ? The duality syntax/semantic is just model theory applied to languages ? (A short answer…
7
votes
3 answers

Complexity of natural language processing problems

Which natural language processing problems are NP-Complete or NP-Hard? I've searched the natural-lang-processing and complexity-theory tags (and related complexity tags), but have not turned up any results. None of the NLP questions that are…
6
votes
1 answer

Measuring the information of a document?

I'd like to measure how much information a document $D$ contains. Clearly, the New York Times published yesterday contains more information than my diary wrote on the same day. But, I do not know how to quantify those differences. I think there are…
6
votes
1 answer

How to calculate an accurate estimated reading time of text?

I suppose the calculation should not be done by only two factors (average reading speed/words per minute, and word count). But at least by a third parameter, that in my opinion should measure the difficulty of the used vocabulary with some kind of…
5
votes
2 answers

How to represent text for a program to add punctuation to a block of text?

I want to try and make a program where I remove the punctuation from a block of text and it then inputs it in the correct places. I have started with a simpler element of just using text and spaces, no commas, full stops etc. The first angle I have…
5
votes
0 answers

Is English Recursively enumerable?

The title says it all. I've tried digging up debate on this issue to see a proof one way or the other but it doesn't look like anyone is able to say whether or not it is. Clearly there are recursive structures in English, and that's about all anyone…
5
votes
2 answers

Algorithm to find the probability of a given text to be about a large topic

I want the conditional probability for each topic (being the word that we give as input). For example, the text being have seen and reviewed your requirements you posted here. If you can give me the fix criteria/category of your data mining then…
Chatur
  • 151
  • 1
1
2 3
13 14